1.0 Introduction

This data visualization aims to reveal insights on the demographic structure of the Singapore population through data from the data set: “Singapore Residents by Planning Area, Age Group, Sex and Type of Dwelling, June 2011-2019” obtained from the Singapore Department of Statistics. The data visualization will focus on the demographic structure of each planning area by age group and gender in 2019.

Note: The visualization will only include planning areas with residents and/or where information is available.

2.0 Data and Design Challenges

2.1 Original Data Set does not allow easy analysis

The original data set is in a form that does not allow for easy analysis. Given that the data appears mainly as a chunk of datapoints, it is difficult to discover any trends or conduct any comparisons desired. In addition, the data set also does not allow for easy visualization.

To solve this, the data has to be first transformed and processed so that proper visualization can be carried out. Once we are able to create meaningful visualization plots, it will be easier to create visualizations that can help in the understanding of trends, comparisons and insights about the demographics of the Singapore population.

2.2 No national and regional views

Based on the data set, there is no aggregation to the national or regional levels which makes it difficult if an individual wants to observe the demographics at a more macro level.

For such cases, the most useful visualization would be an interactive one which would allow a user to swap between views. However, given that a static data visualization has been requested for, the next best alternative would be to create multiple views at (i) a National (ii) a Regional and (iii) a Planning Area level. This will give viewers more choices and different viewing angles to look at the population demographics that will suit their needs.

2.3 Difficulty in comparing across populations due to different absolute sizes

Given the difference in absolute number of individuals across and within all levels (National, Regional, Planning area), it is difficult to be able to properly compare the population demographics. Visualizations using absolute population numbers might fall short in allowing for a proper comparison due to the sheer difference in absolute numbers between certain planning areas and also across levels. This difference would cause visualizations to appear skewed or too small to be compared.

To solve this problem, the visualizations should instead look at the proportion of individuals living within each planning area/region. By standardising all plots to look at the proportion instead of absolute figures, it will be easier to compare and observe the differences in demographic makeup.

3.0 Proposed Design

The proposed design that will improve the above conditions is as follows:

4.0 Step-by-Step Instructions

4.1 Install and Load R packages

Install and load the following R packages:

ggplot2
tidyverse

#to install (if not yet installed) and load the following R packages

packages <- c('ggplot2', 'tidyverse')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

4.2 Processing the Data Set

Data Wrangling is first carried out using the pivot table function in excel, which gives us the required information in a table as follows:

PA	Sex	AG	Population	Pct_PA
Planning Area	Male / Female	Age Group in 5 year bins	Population	Population as percentage of planning area

In addition, we slightly adjust the values in column AG, such that “5_to_9” is reflected as “05_to_09” and “0_to_4” is reflected as "00_to_04. This will allow for easier sorting later.

In order to create overall views on a national and regional level, new data sets will be added by grouping each planning area into their regions and summing up the populations, while maintaining the same level of breakdown in terms of sex and age bands.

For each planning area, the title of the graph includes both the name of the planning area as well as the total population count. This will be obtained by adding the population count next to the planning area name in the original dataset.

In scenarios where the population of specific planning areas is too small, it will lead to skewed proportions in its population pyramid plot. One example being, Lim Chu Kang, where there’s only a population of 70 in that planning area which resulted in over representation of the percentage of population in certain age groups. Hence, by combining such planning areas with the next closes planning area, this will allow the graphs to be better proportioned. The Plaanning areas that were combined are:

Choa Chu Kang & Western Water Catchment
Orchard & Museum
Punggol & Seletar
Sungei Kadut & Lim Chu Kang

4.3 Load Data Set

The data is then imported and read using the read_csv function of Tidyverse:

# To Load the Data set from .csv file

pop_data <- read_csv("Pop_data_Summarised.csv")

4.4 View 1: Creating a Pyramid Plot For the Entire Population

Used as a means of comparison to the entire population, we will first build a Population-pyramid plot, which will provide a breakdown of Singapore’s population by Sex and Age Band.

4.4.1 Pre-processesing Variable

The first step involved is to create a new dataframe, “national_pop”, which filters out the National data that was created and appended into the initial dataset. This can be done using the subset function.

national_pop <- subset(pop_data, Region == "National")

4.4.2 Plotting the Pyramid Plot

We then plot a population pyramid using the data frame above using the following steps:

Create 2 separate Bar charts (one for each gender) using ggplot
Inverse the values for females so that the bars for both genders go in opposite directions
Using coord_flip(), switch the bars from vertical to horizontal.
Add data labels using geom_text and adjust its position
Adjust the format of other features in the plot

#creating the 2 seperate bar charts for each gender
ggplot(data = national_pop, aes(x = AG, y = Pct_PA, fill = Sex)) +
  geom_bar(data = national_pop %>% filter(Sex == "Males"),
           stat = "identity",
           position = "identity") +
  geom_bar(data = national_pop %>% filter(Sex == "Females"),
           stat = "identity",
           position = "identity",
           mapping = aes(y = -(Pct_PA))) +
  
# adding the title, subtitle, data and axis labels 
  labs ( x = "Age Group", y = "Percentage of Population (%)", title = "Population Demographics of Singapore (2019)", subtitle = paste("Population:", sum(national_pop$Population))) +
  geom_text(data = national_pop %>% filter(Sex == "Males"), aes(label = round(abs(Pct_PA), digits = 2),hjust = -0.1), size = rel(9)) +
  geom_text(data = national_pop %>% filter(Sex == "Females"), aes(y = -(Pct_PA), label = round(Pct_PA, digits = 2),hjust = 1.1), size = rel(9)) +
  
# to reflect the negative axis as a positive figure for females
  scale_y_continuous(labels = abs) +
  
# Flip coords to get horizontal bar charts
  coord_flip() +

# adjust theme, format and size
  theme_minimal() +
  theme(axis.text = element_text(size = rel(3)), axis.title = element_text(size = rel(3), face="bold"), plot.title = element_text (size = rel(5), face ="bold"), legend.title = element_text(size = rel(3)), legend.text = element_text(size = rel(3)), plot.subtitle = element_text (size = rel(3.5)))

4.5 View 2: Creating a plot to compare the Demographics at a regional level

To compare the population across different regions in Singapore, a regional level population is created. Similar to the steps in 4.4, the first step involved is to create a new dataframe, “region_pop”, which filters out the regional level data that was created and appended into the initial dataset. This can also be done using the subset function.

Region_pop <- subset(pop_data, Region == "Regional")

A similar population plot is created, We then plot a population pyramid using the data frame above using the following steps:

Create 2 separate Bar charts (one for each gender) using ggplot
Inverse the values for females so that the bars for both genders go in opposite directions
Using coord_flip(), switch the bars from vertical to horizontal.
Add data labels using geom_text and adjust its position
Using facet_wrap to split into regions
Adjust the format of other features in the plot and facets

#Repeating the steps done for the national level plot
ggplot(data = Region_pop, aes(x = AG, y = Pct_PA, fill = Sex)) +
  geom_bar(data = Region_pop %>% filter(Sex == "Males"),
           stat = "identity",
           position = "identity") +
  geom_bar(data = Region_pop %>% filter(Sex == "Females"),
           stat = "identity",
           position = "identity",
           mapping = aes(y = -(Pct_PA))) +
  labs ( x = "Age Group", y = "Percentage of Population in Region (%)",  title = "Population Demographics of Singapore at a Regional level (2019)", subtitle = " ") + 
  geom_text(data = Region_pop %>% filter(Sex == "Males"), aes(label = round(abs(Pct_PA), digits = 2),hjust = -0.1), size = rel(9)) +
  geom_text(data = Region_pop %>% filter(Sex == "Females"), aes(y = -(Pct_PA), label = round(Pct_PA, digits = 2),hjust = 1.1), size = rel(9)) +
  scale_y_continuous(labels = abs) +
  coord_flip() +
  theme_minimal() +
  
# creating a facet wrap for each region
  facet_wrap(~PA, ncol = 2) +

#adjust the size and format of plot elements
  theme(strip.text = element_text(size = rel(3), face = "bold"), strip.background = element_rect(fill = "#E8F6F3"), axis.text = element_text(size = rel(3)), axis.title = element_text(size = rel(3), face="bold"), plot.title = element_text (size = rel(5), face ="bold"),legend.title = element_text(size = rel(2.5)), legend.text = element_text(size = rel(2.5)))

4.6 View 3: Creating Multiple Facet plots to compare the demographics of each planning area within each region

Repeating the steps in 4.5, we focus on getting a facet plot for each region. The following is an example for the North Region. Firstly, we created a new data frame containing data of the planning areas within the north region, using the subset function and named it as “North_pop”.

North_pop <- subset(pop_data, Region == "North")

After which , we adapted a similar approach from 4.5 and adjusted the plots to be based on the new dataframe “North_pop”, which will give us a view of the demographics at a planning area level.

#Repeating the steps done for the Regional Leve plot
ggplot(data = North_pop, aes(x = AG, y = Pct_PA, fill = Sex)) +
  geom_bar(data = North_pop %>% filter(Sex == "Males"),
           stat = "identity",
           position = "identity") +
  geom_bar(data = North_pop %>% filter(Sex == "Females"),
           stat = "identity",
           position = "identity",
           mapping = aes(y = -(Pct_PA))) +
  labs ( x = "Age Group", y = "Percentage of Population in Planning Area (%)",  title = "Population Demographics of Planning Areas in Singapore's North Region (2019)", subtitle = " ") + 
  geom_text(data = North_pop %>% filter(Sex == "Males"), aes(label = round(abs(Pct_PA), digits = 2),hjust = -0.1), size = rel(9)) +
  geom_text(data = North_pop %>% filter(Sex == "Females"), aes(y = -(Pct_PA), label = round(Pct_PA, digits = 2),hjust = 1.1), size = rel(9)) +
  scale_y_continuous(labels = abs) +
  coord_flip() +
  theme_minimal() +
  
# creating a facet wrap for each planning area within the region
  facet_wrap(~PA, ncol = 2) +

#adjust the size and format of plot elements
   theme(strip.text = element_text(size = rel(3), face = "bold"), strip.background = element_rect(fill = "#E8F6F3"), axis.text = element_text(size = rel(3)), axis.title = element_text(size = rel(3), face="bold"), plot.title = element_text (size = rel(5), face ="bold"),legend.title = element_text(size = rel(2.5)), legend.text = element_text(size = rel(2.5)))

The same method will be repeated for all the different Regions: North-East, Central, East and West.

5.0 Final Visualization

5.1 Description of Visualization and Insights

The Visualization offers insights into Singapore’s population demographics at varying levels for policy planners, looking to improve facilities to suit the population, to businesses and individuals looking for the right location for their venture. Some examples include:

At a national level, Singapore’s population pyramid is a “contracting” type and shows a slow shift towards an aging population, with the main bulk of population being above 35 years old. Policy makers should focus on improving facilities for the elderly and individuals can look at providing more suited services such as in healthcare or recreational activities for retirement.
In addition, there are more females above the age of 25 than males at a national level.
At a regional level however, the North and North-east regions appear to have more middle- aged individuals. Middle aged individuals are more likely to be starting families in these areas which will generate demand for facilities such as schools, childcares and other family friendly shops in the area.
At a granular level, it can be observed that the age distribution of the North-east region is skewed by the planning areas: Sengkang, Punggol & Seletar. These planning areas have a large proportion of middle-aged individuals (~30% between 30 – 44) as well as a substantial proportion of children aged below 20. This is likely attributed to the increasing number of BTO flats in these planning areas for new families which would have specific demands that policy holders and businesses could meet.
On the other hand, the central region appears to have an older population, with a larger bulk of individuals over the age of 40. Looking at the planning areas within the central region, it can be observed that many of them have a larger proportion of middle-aged to elderly individuals, with Queenstown and Kallang being a good example. However, not every planning area demonstrates the same pyramid shape (e.g. River Valley) and one must pay greater attention.

The above are just some examples of how the visualization can be used and can still provide a multitude of insights for individuals and businesses alike.

The ‘Greying’ Population of Singapore

Chung Wei Han, Shaun - ISSS608 Assignment 4