The Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, June 2011-2019 data series provided by the Department of Statistics Singapore can be downloaded from https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data as ‘respopagesextod2011to2019.csv’ file. The challenges with working with the downloaded data and visualizing it are described in the following sub-sections.
On examining the downloaded data series in Microsoft Excel, we see that the data series consisted data from Years 2011 to 2019 (see Figure 1 below). Since we are only interested in the latest data in Year 2019, we would need to filter all data with Year 2019 out from the downloaded dataset.
Figure 1: Downloaded Data Series
Also, as we scroll through the list, there are about slightly more than 300 subzones which were grouped into 55 planning areas. While the data is very detailed, it would be a cognitive overload for a reader to visualize so much details in a static visualization. Hence, there is a need to further group the planning areas into a smaller number of regions so that the reader may have a quick and better overview. However, there was no information on how the planning areas could be further grouped into regions. There is a need to find out from other sources how the planning areas could be further grouped into regions. Wikipedia offered a good source of information on this grouping: https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore
Since the ‘respopagesextod2011to2019.csv’ file could be conveniently opened and manipulated in Microsoft Excel, using Data Filters, I was able to filter away all data before Year 2019 and resave the file.
Also, since the grouping of planning areas into regions was not included in the data series that we downloaded from the Department of Statistics Singapore, there is a need to insert this information into the dataset. Wikipedia had provided a structured table on how planning areas are grouped into regions (see Figure 2 below). Hence, I used Python to read this table on how the grouping was done - from Columns 1 and 6, then modify the ‘respopagesextod2011to2019.csv’ file to include the regions.
Figure 2: Structured Table from Wikipedia with Grouping of Planning Areas into Regions
I also transformed the data into a long format using Python. Finally, 2 transformed datasets were ready: ‘Long_Data2.csv’ and ‘Long_Data3.csv’ (see Figure 3 below).
Figure 3: ‘Long_Data2.csv’ and ‘Long_Data3.csv’
There would be an overview visualization (see Figure 4 below) to enable the reader to appreciate the national demographics of Singapore (sorted from oldest to youngest age groups). Since the wordings for the age groups are quite long, we should put these on the y-axis so that the wordings could be clearly and neatly displayed. Since age groups are categorical in nature, and their counts are discrete, we could use a horizontal bar chart with the total count for each age group presented on each bar. Major and minor gridlines could also help the reader have a good sense of how one bar compares to another.
Figure 4: Sketch of Overview Visualization
After the reader is able to appreciate the overall national demographics, we could then present a similar horizontal bar chart (see Figure 5 below), with more information on gender. The horizontal stacked bar chart would have the stacked bars coloured differently for gender - Male and Female. The counts for each stacked bar will also be presented on the graph.
Figure 5: Sketch of Overall National Demographics by Gender
With an appreciation of the national demographics of Singapore by gender, readers can now dive deeper into details - at a region level. I envisaged 6 horizontal bar stacked bar charts (see Figure 6 below) for each region - Central, East, North, North-East, South and West, be visualized together on a common x-axis scale for count of residents. This is to allow the reader to appreciate Singapore demographics across regions. However, since 6 charts are now presented together, I would like to leave out the count being displayed on each bar, so as not to clutter the visualization. I would also like to leave out visualizing the demographics by age groups and planning areas as it will be too much details for a reader to appreciate. Visualizing data by planning area could be done in a different way as described in Section 1.4.4 below.
Figure 6: Sketch of Demographics by Region
So far, the demographics charts described above give information on age groups. We could use a different visualization to lead the reader into another perspective, this time, visualizing the total resident population by region (leaving out details on age groups). See Figure 7 below. Here, we could sort the regions in descending count of their total resident population so that the reader could appreciate which regions have larger population size.
Figure 7: Sketch of Resident Population by Region
Now, with the reader’s perspective changed to population by region, we can dive deeper into population by planning areas. Rather than 1 single visualization on resident population by planning areas, we could display 6 horizontal bar charts (see Figure 8 below) - one for each region. This visualization would help the reader to appreciate which regions the planning areas are grouped under and also the population sizes of the planning areas. The planning areas could be sorted by descending count of population so that the reader is able to appreciate which region contain planning areas of larger resident population sizes.
Figure 8: Sketch of Proposed Design
Tidyverse package was loaded using the following lines:
#installing and loading the required libraries
packages = c('tidyverse')
for (p in packages){
if(!require(p,character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The transformed datasets ‘Long_Data2.csv’ and ’Long_Data3.csv are read and imported using the following lines:
#importing data from transformed Long_Data2.csv file
pop_data = read_csv("data/Long_Data2.csv")
#importing data from transformed Long_Data3.csv file
pop_data_pa = read_csv("data/Long_Data3.csv")
ggplot(data=pop_data, aes(x=Age_Group))
ggplot(data=pop_data, aes(x=Age_Group)) +
geom_bar()
ggplot(data=pop_data, aes(x=Age_Group)) +
geom_bar() +
coord_flip()
ggplot(data=pop_data, aes(x=Age_Group)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())
ggplot(data=pop_data, aes(x=Age_Group)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='light blue', hjust=1.1, size = 2.8)
library(scales)
ggplot(data=pop_data, aes(x=Age_Group)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='light blue', hjust=1.1, size = 2.8) +
scale_y_continuous(labels = comma)
The above final visualization is described in Section 3.1 subsequently.
1.We build on using the codes written for the overview chart. We add ‘fill=Gender’ element to the ggplot function.
library(scales)
ggplot(data=pop_data, aes(x=Age_Group, fill=Gender)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='light blue', hjust=1.1, size = 2.8) +
scale_y_continuous(labels = comma)
ggplot(data=pop_data, aes(x=Age_Group, fill = Gender)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='black', position = position_stack(vjust = .5), size = 2.8) +
scale_y_continuous(labels = comma)
The above final visualization is described in Section 3.2 subsequently.
ggplot(data=pop_data, aes(x=Age_Group, fill = Gender)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='black', position = position_stack(vjust = .5), size = 2.8) +
scale_y_continuous(labels = comma) +
facet_wrap(~ Region)
ggplot(data=pop_data, aes(x=Age_Group, fill = Gender)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
scale_y_continuous(labels = comma) +
facet_wrap(~ Region)
The above final visualization is described in Section 3.3 subsequently.
ggplot(data=pop_data_pa, aes(x= Region))
ggplot(data=pop_data_pa, aes(x= Region)) +
geom_bar()
ggplot(data=pop_data_pa, aes(x= Region)) +
geom_bar() +
coord_flip()
ggplot(data=pop_data_pa, aes(x= Region)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())
ggplot(data=pop_data_pa, aes(x= Region)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='blue', hjust=-0.1, size = 2.8)
## set the levels in the order that we want
pop_data_pa <- within(pop_data_pa,
Region <- factor(Region,
levels=names(sort(table(Region),
decreasing=FALSE))))
ggplot(data=pop_data_pa, aes(x= Region)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='blue', hjust=-0.1, size = 2.8)
## set the levels in the order that we want
pop_data_pa <- within(pop_data_pa,
Region <- factor(Region,
levels=names(sort(table(Region),
decreasing=FALSE))))
ggplot(data=pop_data_pa, aes(x= Region)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='blue', hjust=-0.1, size = 2.8) +
scale_y_continuous(labels = comma)
The above final visualization is described in Section 3.4.1 subsequently.
## set the levels in the order that we want
pop_data_pa <- within(pop_data_pa,
Planning_Area <- factor(Planning_Area,
levels=names(sort(table(Planning_Area),
decreasing=FALSE))))
ggplot(data=pop_data_pa, aes(x= Planning_Area)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='blue', hjust=0.5, size = 2.8) +
scale_y_continuous(labels = comma) +
facet_wrap(~ Region)
## set the levels in the order that we want
pop_data_pa <- within(pop_data_pa,
Planning_Area <- factor(Planning_Area,
levels=names(sort(table(Planning_Area),
decreasing=FALSE))))
ggplot(data=pop_data_pa, aes(x= Planning_Area, fill = Region)) +
geom_bar() +
coord_flip() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
geom_text(stat='count', aes(label=..count..), color='blue', hjust=0.5, size = 2.8) +
scale_y_continuous(labels = comma) +
facet_wrap(~ Region)
The above final visualization is described in Section 3.4.2 subsequently.
The below chart presents the overall Singapore demographics in 2019, with the ‘active’ population [25 to 64 years-olds] forming the majority of the resident population and an increasingly shrinking ‘young’ population [0 to 24 years-olds].
Codes presented in Section 2.3.
The chart shows the Singapore demographics breakdown by gender in 2019. For the ‘old’ population [65 years-olds and older], the females outnumber the males. For the ‘young’ population, the males slightly outnumber the females.
Codes presented in Section 2.4.
The demographic structure of Singapore residents is further broken down into regions - Central, East, North, North-East, South and West. In the North-East region, the key difference compared with the overall demographics structure of Singapore was that we observe a higher proportion of children and youths aged 0 to 14 years old, indicating younger families residing in this region. Only a minority of Singapore residents reside in the South region.
Codes presented in Section 2.5.
The below chart presents Singapore’s 2019 resident population by regions, sorted in descending sizes, with the North-East region having the highest number of resident population and the South region having the least.
Codes presented in Section 2.6.
The below graph aims to show 2 aspects of information: (1) a sensing of the number of planning areas in each region and (2) how each planning area compare in terms of resident population to others.
From the previous graph, while we see that the population size of the Central region is almost as large as the North-East region, it (Central) consists much more planning areas (albeit with smaller population sizes), while the North-East region consists lesser planning areas but with bigger population sizes.
Codes presented in Section 2.7.