This data visualization aims to reveal insights on the demographic structure of the Singapore population through data from the data set: “Singapore Residents by Planning Area, Age Group, Sex and Type of Dwelling, June 2011-2019” obtained from the Singapore Department of Statistics. The data visualization will focus on the demographic structure of each planning area by age group and gender in 2019.
Note: The visualization will only include planning areas with residents and/or where information is available.
2.1 Original Data Set does not allow easy analysis
The original data set is in a form that does not allow for easy analysis. Given that the data appears mainly as a chunk of datapoints, it is difficult to discover any trends or conduct any comparisons desired. In addition, the data set also does not allow for easy visualization.
To solve this, the data has to be first transformed and processed so that proper visualization can be carried out. Once we are able to create meaningful visualization plots, it will be easier to create visualizations that can help in the understanding of trends, comparisons and insights about the demographics of the Singapore population.
2.2 No national and regional views
Based on the data set, there is no aggregation to the national or regional levels which makes it difficult if an individual wants to observe the demographics at a more macro level.
For such cases, the most useful visualization would be an interactive one which would allow a user to swap between views. However, given that a static data visualization has been requested for, the next best alternative would be to create multiple views at (i) a National (ii) a Regional and (iii) a Planning Area level. This will give viewers more choices and different viewing angles to look at the population demographics that will suit their needs.
2.3 Difficulty in comparing across populations due to different absolute sizes
Given the difference in absolute number of individuals across and within all levels (National, Regional, Planning area), it is difficult to be able to properly compare the population demographics. Visualizations using absolute population numbers might fall short in allowing for a proper comparison due to the sheer difference in absolute numbers between certain planning areas and also across levels. This difference would cause visualizations to appear skewed or too small to be compared.
To solve this problem, the visualizations should instead look at the proportion of individuals living within each planning area/region. By standardising all plots to look at the proportion instead of absolute figures, it will be easier to compare and observe the differences in demographic makeup.
The proposed design that will improve the above conditions is as follows:
Install and load the following R packages:
#to install (if not yet installed) and load the following R packages
packages <- c('ggplot2', 'tidyverse')
for (p in packages){
if (!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Data Wrangling is first carried out using the pivot table function in excel, which gives us the required information in a table as follows:
| PA | Sex | AG | Population | Pct_PA |
|---|---|---|---|---|
| Planning Area | Male / Female | Age Group in 5 year bins | Population | Population as percentage of planning area |
In addition, we slightly adjust the values in column AG, such that “5_to_9” is reflected as “05_to_09” and “0_to_4” is reflected as "00_to_04. This will allow for easier sorting later.
In order to create overall views on a national and regional level, new data sets will be added by grouping each planning area into their regions and summing up the populations, while maintaining the same level of breakdown in terms of sex and age bands.
For each planning area, the title of the graph includes both the name of the planning area as well as the total population count. This will be obtained by adding the population count next to the planning area name in the original dataset.
In scenarios where the population of specific planning areas is too small, it will lead to skewed proportions in its population pyramid plot. One example being, Lim Chu Kang, where there’s only a population of 70 in that planning area which resulted in over representation of the percentage of population in certain age groups. Hence, by combining such planning areas with the next closes planning area, this will allow the graphs to be better proportioned. The Plaanning areas that were combined are:
The data is then imported and read using the read_csv function of Tidyverse:
# To Load the Data set from .csv file
pop_data <- read_csv("Pop_data_Summarised.csv")
Used as a means of comparison to the entire population, we will first build a Population-pyramid plot, which will provide a breakdown of Singapore’s population by Sex and Age Band.
The first step involved is to create a new dataframe, “national_pop”, which filters out the National data that was created and appended into the initial dataset. This can be done using the subset function.
national_pop <- subset(pop_data, Region == "National")
We then plot a population pyramid using the data frame above using the following steps:
#creating the 2 seperate bar charts for each gender
ggplot(data = national_pop, aes(x = AG, y = Pct_PA, fill = Sex)) +
geom_bar(data = national_pop %>% filter(Sex == "Males"),
stat = "identity",
position = "identity") +
geom_bar(data = national_pop %>% filter(Sex == "Females"),
stat = "identity",
position = "identity",
mapping = aes(y = -(Pct_PA))) +
# adding the title, subtitle, data and axis labels
labs ( x = "Age Group", y = "Percentage of Population (%)", title = "Population Demographics of Singapore (2019)", subtitle = paste("Population:", sum(national_pop$Population))) +
geom_text(data = national_pop %>% filter(Sex == "Males"), aes(label = round(abs(Pct_PA), digits = 2),hjust = -0.1), size = rel(9)) +
geom_text(data = national_pop %>% filter(Sex == "Females"), aes(y = -(Pct_PA), label = round(Pct_PA, digits = 2),hjust = 1.1), size = rel(9)) +
# to reflect the negative axis as a positive figure for females
scale_y_continuous(labels = abs) +
# Flip coords to get horizontal bar charts
coord_flip() +
# adjust theme, format and size
theme_minimal() +
theme(axis.text = element_text(size = rel(3)), axis.title = element_text(size = rel(3), face="bold"), plot.title = element_text (size = rel(5), face ="bold"), legend.title = element_text(size = rel(3)), legend.text = element_text(size = rel(3)), plot.subtitle = element_text (size = rel(3.5)))
To compare the population across different regions in Singapore, a regional level population is created. Similar to the steps in 4.4, the first step involved is to create a new dataframe, “region_pop”, which filters out the regional level data that was created and appended into the initial dataset. This can also be done using the subset function.
Region_pop <- subset(pop_data, Region == "Regional")
A similar population plot is created, We then plot a population pyramid using the data frame above using the following steps:
#Repeating the steps done for the national level plot
ggplot(data = Region_pop, aes(x = AG, y = Pct_PA, fill = Sex)) +
geom_bar(data = Region_pop %>% filter(Sex == "Males"),
stat = "identity",
position = "identity") +
geom_bar(data = Region_pop %>% filter(Sex == "Females"),
stat = "identity",
position = "identity",
mapping = aes(y = -(Pct_PA))) +
labs ( x = "Age Group", y = "Percentage of Population in Region (%)", title = "Population Demographics of Singapore at a Regional level (2019)", subtitle = " ") +
geom_text(data = Region_pop %>% filter(Sex == "Males"), aes(label = round(abs(Pct_PA), digits = 2),hjust = -0.1), size = rel(9)) +
geom_text(data = Region_pop %>% filter(Sex == "Females"), aes(y = -(Pct_PA), label = round(Pct_PA, digits = 2),hjust = 1.1), size = rel(9)) +
scale_y_continuous(labels = abs) +
coord_flip() +
theme_minimal() +
# creating a facet wrap for each region
facet_wrap(~PA, ncol = 2) +
#adjust the size and format of plot elements
theme(strip.text = element_text(size = rel(3), face = "bold"), strip.background = element_rect(fill = "#E8F6F3"), axis.text = element_text(size = rel(3)), axis.title = element_text(size = rel(3), face="bold"), plot.title = element_text (size = rel(5), face ="bold"),legend.title = element_text(size = rel(2.5)), legend.text = element_text(size = rel(2.5)))
Repeating the steps in 4.5, we focus on getting a facet plot for each region. The following is an example for the North Region. Firstly, we created a new data frame containing data of the planning areas within the north region, using the subset function and named it as “North_pop”.
North_pop <- subset(pop_data, Region == "North")
After which , we adapted a similar approach from 4.5 and adjusted the plots to be based on the new dataframe “North_pop”, which will give us a view of the demographics at a planning area level.
#Repeating the steps done for the Regional Leve plot
ggplot(data = North_pop, aes(x = AG, y = Pct_PA, fill = Sex)) +
geom_bar(data = North_pop %>% filter(Sex == "Males"),
stat = "identity",
position = "identity") +
geom_bar(data = North_pop %>% filter(Sex == "Females"),
stat = "identity",
position = "identity",
mapping = aes(y = -(Pct_PA))) +
labs ( x = "Age Group", y = "Percentage of Population in Planning Area (%)", title = "Population Demographics of Planning Areas in Singapore's North Region (2019)", subtitle = " ") +
geom_text(data = North_pop %>% filter(Sex == "Males"), aes(label = round(abs(Pct_PA), digits = 2),hjust = -0.1), size = rel(9)) +
geom_text(data = North_pop %>% filter(Sex == "Females"), aes(y = -(Pct_PA), label = round(Pct_PA, digits = 2),hjust = 1.1), size = rel(9)) +
scale_y_continuous(labels = abs) +
coord_flip() +
theme_minimal() +
# creating a facet wrap for each planning area within the region
facet_wrap(~PA, ncol = 2) +
#adjust the size and format of plot elements
theme(strip.text = element_text(size = rel(3), face = "bold"), strip.background = element_rect(fill = "#E8F6F3"), axis.text = element_text(size = rel(3)), axis.title = element_text(size = rel(3), face="bold"), plot.title = element_text (size = rel(5), face ="bold"),legend.title = element_text(size = rel(2.5)), legend.text = element_text(size = rel(2.5)))
The same method will be repeated for all the different Regions: North-East, Central, East and West.
The Visualization offers insights into Singapore’s population demographics at varying levels for policy planners, looking to improve facilities to suit the population, to businesses and individuals looking for the right location for their venture. Some examples include:
At a national level, Singapore’s population pyramid is a “contracting” type and shows a slow shift towards an aging population, with the main bulk of population being above 35 years old. Policy makers should focus on improving facilities for the elderly and individuals can look at providing more suited services such as in healthcare or recreational activities for retirement.
In addition, there are more females above the age of 25 than males at a national level.
At a regional level however, the North and North-east regions appear to have more middle- aged individuals. Middle aged individuals are more likely to be starting families in these areas which will generate demand for facilities such as schools, childcares and other family friendly shops in the area.
At a granular level, it can be observed that the age distribution of the North-east region is skewed by the planning areas: Sengkang, Punggol & Seletar. These planning areas have a large proportion of middle-aged individuals (~30% between 30 – 44) as well as a substantial proportion of children aged below 20. This is likely attributed to the increasing number of BTO flats in these planning areas for new families which would have specific demands that policy holders and businesses could meet.
On the other hand, the central region appears to have an older population, with a larger bulk of individuals over the age of 40. Looking at the planning areas within the central region, it can be observed that many of them have a larger proportion of middle-aged to elderly individuals, with Queenstown and Kallang being a good example. However, not every planning area demonstrates the same pyramid shape (e.g. River Valley) and one must pay greater attention.
The above are just some examples of how the visualization can be used and can still provide a multitude of insights for individuals and businesses alike.