Overview

The total population in Singapore was estimated at 5.7 million people in 2019. In this article, we make use of ggplot2 and other R packages to better understand the city state’s population makeup.

Design challenges

Challenge 1: The raw data encompass various data types, such as cross-sectional and time-series, determining the optimal visualisation method to represent these data is one of the challenges. To overcome this, a sketch design was done and included in the section below to plan out the type of charts to be used.

Challenge 2: Another design challenge was about determining the optimal layout of the multiple charts to best deliver the message author try to convey with this article. This was overcome with the help of the ‘gridExtra’ package to combine all charts done together in one visualisation.

Proposed sketch design

The proposed sketch design is a hand-drawn piece where a brief plan of the final visualisation is first laid out. The visualisation to be constructed using ggplot2 and other packages in R will follow the structure and specific points indicated in this sketch while it’s worth noting that there may be slight variations in the final visualisation.

Preparation for visualisation

1. Install and load R packages

ggplot2 is used primarily for graph building as it is a system for declaratively creating graphics, based on The Grammar of Graphics
dplyr is used for data manipulation by providing a consistent set of verbs that help to solve common data manipulation challenges.
scales package maps data to aesthetics, and provides methods for automatically determining breaks and labels for axes and legends.
lubridate is used for handling data involves date and time
ggpubr provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots

packages = c('ggplot2','dplyr','scales','lubridate','ggpubr')

for (p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2. Import data

# import data on population data
df=read.csv("df.csv")  
# extract data for 2019 only
df1 <- data.frame(status=df$status,value=df$X2019)
# import data on population growth rate 
dtgrowth <- read.csv("total_pop_growth_rate.csv")
dtgrowth$year <- as.Date(ISOdate(dtgrowth$year, 1, 1))
dtgrowth$rate <-dtgrowth$rate*0.01

3. Construct visualisation

3.1 Pie chart – Population composition of Singapore in 2019
# Basic PIE chart for 2019 population makeup by residency status
g1 <-ggplot(df1, aes(x="", y=value, fill=status)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  labs(title = "Singapore population hit 5.7M in 2019")+ 
  theme(title = element_text(face = "bold", size=12,family = "Helvetica",color = "black"))+
  theme(legend.title = element_text(color = "black", size = 9),
        legend.text = element_text(color = "black",family = "Helvetica"))+ #change legend text format
  labs(fill = "Residency Status")+ ##change legend title
  theme(plot.title = element_text(vjust = -6,hjust=0.5))+ # change chart title position
  geom_text(aes(label = paste(status,round(value/1000000,digits = 2),"M")),hjust=0.58,
            position = position_stack(vjust = 0.5),color="white",
            family = "Helvetica",fontface = "bold",size=2.6)+
  scale_fill_manual(values=c("#ffa600", "#858800", "#205c0c"))+
  theme(panel.background = element_blank(),axis.title = element_blank(),
        axis.text = element_blank())
3.2 Stacked bar chart – Historical population composition in past 5 years
1) Data wrangling before constructing the stacked bar chart
# create a new data frame for building stacked bar chart
year <- c(rep("2015" , 3) , rep("2016" , 3) , rep("2017" , 3) , rep("2018" , 3), rep("2019" , 3) )
status <- rep(c("Singapore Citizen" , "PR" , "Non-resident") , 5)
value <- c(df$X2015,df$X2016,df$X2017,df$X2018,df$X2019)
data <- data.frame(year,status,value)

# sum of population for every year -- to be used as data label for each bar
totals <- data %>%
  group_by(year) %>%
  summarize(total = sum(value))
# express the number in Millions
totals$total <- round(totals$total/1000000,digits = 2)
2) Chart building
g2 <-ggplot(data, aes(fill=status, y=value, x=year)) + 
  geom_bar(position="stack", stat="identity")+
  labs(title = "Consistent growth in Citizen & stable PR population", x = "", 
       y = "Population")+ #edit chart title and axis title
  theme(title = element_text(face = "bold.italic", family = "Helvetica",color = "black",size=7.6))+ # format chart title text
  theme(axis.title.y = element_text(face="italic",family = "Helvetica", color="black", size=8))+
  theme(plot.title = element_text(vjust=5,hjust=0.8))+ # change chart title position
  scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6))+ # change the unit of y-axis label to millions
  theme(axis.text.x = element_text(face="bold",family = "Helvetica", color="black", size=8),
        axis.text.y = element_text(family = "Helvetica", color="black", size=6))+ # format axes text style
  guides(fill = guide_legend(reverse=TRUE))+ ## reverse legend order
  geom_text(aes(label = paste(round(value/1000000,digits = 2),"M")),position = position_stack(vjust = 0.5),color="white",
          family = "Helvetica",fontface = "bold",size=3)+ # add data label & format the data label
  scale_fill_manual(values=c("#ffa600", "#858800", "#205c0c"))+ # change bar fill colour
  theme(legend.position = "none")+
  theme(panel.background = element_blank())+ # remove background
  theme( axis.line = element_line(colour = "black", 
                                  size = 0.4, linetype = "solid"))+ # FORMAT AXIS LINE
  geom_text(aes(year, total, label = paste(total,"M"), fill = NULL), data = totals,
            position = position_stack(vjust = 1020000),family = "Helvetica",size=3.5,color="black") # add label showing annual population
3.3 Point and line chart – Change in Singapore’s population growth over the past 10 years
# Time series point and line chart
g3 <-ggplot(dtgrowth, aes(x=year, y=rate)) +
  geom_line(color="#757575",size=0.5) +
  geom_point(color="#757575",size=1.5)+
  scale_y_continuous(labels = percent)+
  labs(title = "Increasing population growth rate from 2017 to 2019", x = "", 
       y = "Population Growth Rate",caption="Source: Department of Statistics Singapore")+
  theme(plot.title = element_text(face = "bold.italic", family = "Helvetica",color = "black",size=9))+
  theme(axis.title.y = element_text(face="italic",family = "Helvetica", color="black", size=8))+ #format y-axis title
  scale_x_date(limit=c(as.Date("2010-01-01"),as.Date("2019-01-01")), date_breaks = "1 year",date_labels = "%Y") +
  theme(axis.text.x = element_text(face="bold",family = "Helvetica", color="black", size=6.8,hjust=0.5), #format x-axis labels
        axis.text.y = element_text(family = "Helvetica", color="black", size=6))+ #format y-axis label
  geom_text(aes(label = paste(round(rate*100,digits = 1),"%")),position = position_nudge(y =0.002),hjust=0.5,
            color="black",family = "Helvetica",fontface = "bold",size=2.7)+
  theme(panel.background = element_blank())+
  theme( axis.line = element_line(colour = "black", 
                                  size = 0.4, linetype = "solid"))+ # FORMAT AXIS LINE
  theme(plot.title = element_text(vjust=5,hjust=0.6), # change chart title position
        plot.caption = element_text(face = "italic", family = "Helvetica",color = "black",size=7,vjust=2))

Final visualisation and insights

After constructing the three individual charts for this exercise, the final visualisation can be seen as follows:

Looking at the final visualisation we can see that in 2019, there were 3.50 million Singapore citizens, together with 0.53 million permanent residents (PRs), there were a total of 4.03 million residents. The 1.68 million non-resident include defendants, international students, and individuals who are in Singapore to work. Overall, they made up the 5.7 million population in 2019.

We also see that there has been a stable growth in citizen population from 2015 to 2019 with the PR population staying relatively constant.

Furthermore, we observe that in 2019, the population growth rate of 1.2% was a result of consecutive increase in the previous two years, making it on the roughly same level as it was in 2014-2016.