1. Introduction

This Data Visualization aims to discover the demographic patterns of the generational age groups, across the various Planning Areas of Singapore in 2019. There are 55 planning areas in Singapore. Planning areas mark the geographical boundaries that separate Singapore.

2. Major data and design challenges

An excel sheet from https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data consisting of age groups grouped by planning area, subzone, age, sex, dwelling and year was downloaded.

Type of challenge Description
Data Since the data is from 2011-2019, we need to filter the data to only consider data from 2019.
Data The range of each age bin in the data is 5 years, except the last age bin with the name “90 & Over”. Therefore, we have many age bins. Accounting for each age bin and visualizing them will not be very helpful for the reader. Creating demographic age groups for visualization will be easier for the reader.
Data As there may be lesser people in some age groups, using the numbers in the “Pop” column for the visualization may not be very accurate. We need to account for the percentage of people from a demographic age group instead.
Data We need to group the data by planning area and age while considering the percentage of people in each demographic age group.
Data There are some planning areas with a population of 0. We need to remove these planning areas to ensure that they don’t show up in the later visualization and confuse the readers.
Design As there are many planning areas in Singapore, we need to be able to visualize the data in a way where the reader can easily get an overview of the age demographics in Singapore by planning area.
Design As ggplot automatically extends the axes slightly, we need to make sure that the both the x and y axis starts at 0 to prevent any confusion to the readers
Design The automatic palette set for ggplot uses red and green color for the bars. Since red and green should not be in the same visualization, we need to the change the palette colors.

2.2 Proposed Visualisation

Please note that the color that looks like red is supposed to be pink. It can’t show up on the camera well.

3. Step-by-step description

3.1 Installing packages and reading data

The tidyverse library is installed so as to install the various libraries needed for data manipulation and exploration. The demographics2.csv file which contains the downloaded data is read into RStudio using the read_csv function. The data was then filtered and subset to only include data collected in 2019 where the rows don’t have a population of 0.

packages = c('tidyverse')
for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}
demographics <- read_csv("demographics2.csv")
demographics <- demographics%>%filter(Time == 2019)
demographics <- subset(demographics, Pop!=0)

3.2 Identify demographic age groups

We can now create various demographic age groups based on the demographics.csv file. Based on the Center for generational kinetics: https://genhq.com/faq-info-about-generations/, there are currently 5 primary generations that make up our society. The birth years for each generation are shown below.

Generation group Age
Generation Z 0-24
Millennials 25-43
Generation X 44-55
Baby Boomers 56-74
Silent Generation Above 74

3.3 Create generation age groups

Based on the table above, the generation age groups are created. Firstly, the different ages in the age column is separated into their respective age groups.

ind1 = demographics$AG > 0 & demographics$AG < 24
ind2 = demographics$AG > 24 & demographics$AG < 43
ind3 = demographics$AG > 43 & demographics$AG < 55
ind4 = demographics$AG > 55 & demographics$AG < 74
ind5 = demographics$AG > 74

After categorizing the rows to the different demographic age groups, a new column called “generation” is created and the respective generation age group name is given to each of the rows.

demographics$generation[ind1] = 'Generation Z'
demographics$generation[ind2] = 'Millennials'
demographics$generation[ind3] = 'Generation X'
demographics$generation[ind4] = 'Baby Boomers'
demographics$generation[ind5] = 'Silent Generation'

To get the percentage of people in a planning area from a generation age group, we first have to group the data according to the “PA” column to get the population of the planning area.

Then, we group by the newly generated “generation” column so that for each row, we can divide the population in each generation group by the population in each planning area. This will help us generate the percentage of people in a planning area from a generation age group in a new column called “Pop/areapop”.

demographics <- demographics %>%
    group_by(PA) %>%
    mutate(areapop = sum(Pop)) %>%
    group_by(generation, .add=TRUE) %>%
    mutate(Pop/areapop)

The DT package is installed and run on our data for us to have a clear look at the current state of our dataset. The first 6 rows of our data is shown as follows.

if (!require("DT")) install.packages('DT')
datatable(head(demographics), class = 'cell-border stripe')

3.3 Generate plot

The percentage stacked bar chart was generated using ggplot. The title “Percentage of people by generation group and planning area” is added. The x-axis was set to the planning area and the y-axis was set to the new column “Pop/areapop”. The fill was set to the generation groups so that we can color code the percentage stacked chart according to the generation groups. We then flip the x and y axis. scale_y_continuous(expand = c(0, 0)) was added make sure the stacked bar graphs start at 0.

ggplot(demographics, aes(x = PA, y = Pop/areapop, fill = generation)) +
  ggtitle("Percentage of people by generation group and planning area") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Set2") +
  geom_bar(position="fill", stat="identity") +
  xlab("Planning Area") + ylab("% of people by generation") +
  coord_flip() +
  scale_y_continuous(expand = c(0, 0))

4. Insights

The percentage stacked bar chart gives a clear overview of how the generational groups are spread out across Singapore. In addition, it gives us some insight into exactly which areas will be needing the most attention to cater to the increase in elderly (Silent Generation).

Before we go into the insights, we must remember and note that some planning areas are excluded as they have a population of 0.

  1. The generation groups Baby Boomers, Generation X, Generation Z, and Millennials seems to have the most people and are well spread out across the country in terms of planning area. Therefore, there seems to be a need to start thinking and catering to an increasing aging population across all planning areas of Singapore.

  2. The silent generation group has the least population count in Singapore but they also seem to be seem to be well spread out across the country in terms of planning area. However, there are some planning areas in Singapore with nobody in the silent generation. This may be because the facilities in these planning areas are not suitable for them. It is important for these planning areas to look at their current facilities and see what can be improved to cater to an increasing aging population.

  3. Lim Chu Kang has the highest percentage of baby boomers in its area. Since the “Baby Boomers” generation will soon become the “Silent Generation”, Lim Chu Kang must take note of the large increase in elderly in the coming years and cater to their needs.