The total population in Singapore was estimated at 5.7 million people in 2019. In this article, we make use of ggplot2 and other R packages to better understand the city state’s population makeup.
▪ Challenge 1: The raw data encompass various data types, such as cross-sectional and time-series, determining the optimal visualisation method to represent these data is one of the challenges. To overcome this, a sketch design was done and included in the section below to plan out the type of charts to be used.
▪ Challenge 2: Another design challenge was about determining the optimal layout of the multiple charts to best deliver the message author try to convey with this article. This was overcome with the help of the ‘gridExtra’ package to combine all charts done together in one visualisation.
The proposed sketch design is a hand-drawn piece where a brief plan of the final visualisation is first laid out. The visualisation to be constructed using ggplot2 and other packages in R will follow the structure and specific points indicated in this sketch while it’s worth noting that there may be slight variations in the final visualisation.
▪ ggplot2 is used primarily for graph building as it is a system for declaratively creating graphics, based on The Grammar of Graphics
▪ dplyr is used for data manipulation by providing a consistent set of verbs that help to solve common data manipulation challenges.
▪ scales package maps data to aesthetics, and provides methods for automatically determining breaks and labels for axes and legends.
▪ lubridate is used for handling data involves date and time
▪ ggpubr provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots
packages = c('ggplot2','dplyr','scales','lubridate','ggpubr')
for (p in packages){
if(!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
# import data on population data
df=read.csv("df.csv")
# extract data for 2019 only
df1 <- data.frame(status=df$status,value=df$X2019)
# import data on population growth rate
dtgrowth <- read.csv("total_pop_growth_rate.csv")
dtgrowth$year <- as.Date(ISOdate(dtgrowth$year, 1, 1))
dtgrowth$rate <-dtgrowth$rate*0.01
# Basic PIE chart for 2019 population makeup by residency status
g1 <-ggplot(df1, aes(x="", y=value, fill=status)) +
geom_bar(stat="identity", width=1, color="white") +
coord_polar("y", start=0) +
labs(title = "Singapore population hit 5.7M in 2019")+
theme(title = element_text(face = "bold", size=12,family = "Helvetica",color = "black"))+
theme(legend.title = element_text(color = "black", size = 9),
legend.text = element_text(color = "black",family = "Helvetica"))+ #change legend text format
labs(fill = "Residency Status")+ ##change legend title
theme(plot.title = element_text(vjust = -6,hjust=0.5))+ # change chart title position
geom_text(aes(label = paste(status,round(value/1000000,digits = 2),"M")),hjust=0.58,
position = position_stack(vjust = 0.5),color="white",
family = "Helvetica",fontface = "bold",size=2.6)+
scale_fill_manual(values=c("#ffa600", "#858800", "#205c0c"))+
theme(panel.background = element_blank(),axis.title = element_blank(),
axis.text = element_blank())
# create a new data frame for building stacked bar chart
year <- c(rep("2015" , 3) , rep("2016" , 3) , rep("2017" , 3) , rep("2018" , 3), rep("2019" , 3) )
status <- rep(c("Singapore Citizen" , "PR" , "Non-resident") , 5)
value <- c(df$X2015,df$X2016,df$X2017,df$X2018,df$X2019)
data <- data.frame(year,status,value)
# sum of population for every year -- to be used as data label for each bar
totals <- data %>%
group_by(year) %>%
summarize(total = sum(value))
# express the number in Millions
totals$total <- round(totals$total/1000000,digits = 2)
g2 <-ggplot(data, aes(fill=status, y=value, x=year)) +
geom_bar(position="stack", stat="identity")+
labs(title = "Consistent growth in Citizen & stable PR population", x = "",
y = "Population")+ #edit chart title and axis title
theme(title = element_text(face = "bold.italic", family = "Helvetica",color = "black",size=7.6))+ # format chart title text
theme(axis.title.y = element_text(face="italic",family = "Helvetica", color="black", size=8))+
theme(plot.title = element_text(vjust=5,hjust=0.8))+ # change chart title position
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6))+ # change the unit of y-axis label to millions
theme(axis.text.x = element_text(face="bold",family = "Helvetica", color="black", size=8),
axis.text.y = element_text(family = "Helvetica", color="black", size=6))+ # format axes text style
guides(fill = guide_legend(reverse=TRUE))+ ## reverse legend order
geom_text(aes(label = paste(round(value/1000000,digits = 2),"M")),position = position_stack(vjust = 0.5),color="white",
family = "Helvetica",fontface = "bold",size=3)+ # add data label & format the data label
scale_fill_manual(values=c("#ffa600", "#858800", "#205c0c"))+ # change bar fill colour
theme(legend.position = "none")+
theme(panel.background = element_blank())+ # remove background
theme( axis.line = element_line(colour = "black",
size = 0.4, linetype = "solid"))+ # FORMAT AXIS LINE
geom_text(aes(year, total, label = paste(total,"M"), fill = NULL), data = totals,
position = position_stack(vjust = 1020000),family = "Helvetica",size=3.5,color="black") # add label showing annual population
# Time series point and line chart
g3 <-ggplot(dtgrowth, aes(x=year, y=rate)) +
geom_line(color="#757575",size=0.5) +
geom_point(color="#757575",size=1.5)+
scale_y_continuous(labels = percent)+
labs(title = "Increasing population growth rate from 2017 to 2019", x = "",
y = "Population Growth Rate",caption="Source: Department of Statistics Singapore")+
theme(plot.title = element_text(face = "bold.italic", family = "Helvetica",color = "black",size=9))+
theme(axis.title.y = element_text(face="italic",family = "Helvetica", color="black", size=8))+ #format y-axis title
scale_x_date(limit=c(as.Date("2010-01-01"),as.Date("2019-01-01")), date_breaks = "1 year",date_labels = "%Y") +
theme(axis.text.x = element_text(face="bold",family = "Helvetica", color="black", size=6.8,hjust=0.5), #format x-axis labels
axis.text.y = element_text(family = "Helvetica", color="black", size=6))+ #format y-axis label
geom_text(aes(label = paste(round(rate*100,digits = 1),"%")),position = position_nudge(y =0.002),hjust=0.5,
color="black",family = "Helvetica",fontface = "bold",size=2.7)+
theme(panel.background = element_blank())+
theme( axis.line = element_line(colour = "black",
size = 0.4, linetype = "solid"))+ # FORMAT AXIS LINE
theme(plot.title = element_text(vjust=5,hjust=0.6), # change chart title position
plot.caption = element_text(face = "italic", family = "Helvetica",color = "black",size=7,vjust=2))
After constructing the three individual charts for this exercise, the final visualisation can be seen as follows:
Looking at the final visualisation we can see that in 2019, there were 3.50 million Singapore citizens, together with 0.53 million permanent residents (PRs), there were a total of 4.03 million residents. The 1.68 million non-resident include defendants, international students, and individuals who are in Singapore to work. Overall, they made up the 5.7 million population in 2019.
We also see that there has been a stable growth in citizen population from 2015 to 2019 with the PR population staying relatively constant.
Furthermore, we observe that in 2019, the population growth rate of 1.2% was a result of consecutive increase in the previous two years, making it on the roughly same level as it was in 2014-2016.