Singapore’s demographic is multi-ethinic in nature. One of the key indicators of Singapore’s economic success is the inclusiveness of public housing.
Through this data visualization exercise:
This document provides the R code that is executable, and the packages used in this exercise are dplyr, tidyverse, and ggplot2
Rpubs link: https://rpubs.com/sashankramesh/642489
Instructions: In case you want to run the file locally, set the directory using the provision provided in line #84.
Population growth in Singapore
Population growth in Singapore by Housing Type
Population growth in Singapore by Planning Area
The source of the data was https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data. Before diving deeper into the visualization, there were data and design challenges that we needed to rectify. First, let’s look at the data challenges:
Initially, the dataset contained categories for different type of HDB properties. Upon analysis, we grouped all the types of HDB properties into HDB, and the other property types into Condominiums, HUDC properties, Landed Properties, and the rest that follow into others.
The rationale behind grouping the properties was one of simplification.
The ‘Pop’ column has values of ‘0’ indicating that the number of residents for that specific category is 0. To account for the same, we drop the rows that have population that is 0.
Hence, the dataset will contain sub-groups, dwelling types, age groups where the population is greater than 0.
For our analysis, we will not be using age groups to visualize for story-telling. The age groups are not necessarily in the right format for data consumption. Since we will not be using the age group for our visualization, we will not change the format of the age groups.
Install Packages and Load File
The packages that we need for our data visualization process is:
The input file is downloaded from Singstat. The data describes the dwelling type, age group, sex, planning area, and year. As previously mentioned, there are records with no population. Hence, we drop the rows that have no population
input<- read.csv("respopagesextod2011to2019.csv")
input<-input[!(input$Pop=="0"),]
The growth in population is important to understand with respect to the Singapore context. Through this study, we will look at growth of population in absolute terms, but also in comparison with the other dimensions in the dataset. Firstly, let’s look at figure 1- this is the absolute growth of population in Singapore from 2011-19.
As is evident, we use the dplyr() function to manipulate the data. Here are the steps below to generate the charts: a) Time is reognised as an integer. We convert the datatype to character to represent discrete values. b) We filter the column by seleting three columns:- Time, property type (TOD) , and Pop c) Groupby time and property type d) Generate the sum of population to be represent the growth across years
tempo<- input%>%
mutate(Time = as.character(Time))%>%
select(Time, TOD, Pop) %>%
group_by(Time, TOD) %>%
summarise(pop_sum = sum(Pop))
ggplot(tempo, aes(x=Time, y=pop_sum, color=TOD))+stat_summary(geom='bar',color="black",fun='mean',fill="lightblue")+labs(title="Figure 1: Population in Singapore, 2011-2019", x="Year", y="Population", caption="Source: SingStat")
In Figure 1, we observe that the population remains fairly constant from 2011 to 2016. However, the population sees significant increase from 2016 to 2019. This population refers to the group of people who are residents in Singapore.
Now let’s look at the growth of resident population of Singapore by Gender. The population in the chart below is represented in 000s. The steps followed to generate the chart are as follows:
As is evident, we use the dplyr() function to manipulate the data. Here are the steps below to generate the charts: a) Time is reognised as an integer. We convert the datatype to character to represent discrete values. b) We filter the column by seleting three columns:- Time, Sex , and Pop c) Groupby time and Sex d) Generate the sum of population to be represent the growth across years
temp<- input%>%
mutate(Time = as.character(Time))%>%
select(Time, Sex, Pop) %>%
group_by(Time, Sex) %>%
summarise(pop_sum = sum(Pop))
ggplot(temp, aes(x=Time, y=pop_sum/1000, color=Sex))+geom_point(position="jitter", size=3)+labs(title="Figure 2: Population in Singapore, 2011-2019", x="Year", y="Population", caption="Source: SingStat")+theme_classic()+theme(plot.title = element_text(hjust = 0.5))+scale_color_manual(values=c("#F13B17", "#4472C4"))+theme(plot.caption = element_text(hjust = 1, face = "italic"))
Figure 2:
The population is represented in 000s. There is a different in the distribution of males and females in the resident population.The total number of females in each year from 2011 to 2019 are greater than males in the same time period. However, the growth rate of the femlae population and the male population is similar.
We want to understand the type of properties and the population growth by population type.
Firstly, we will look at the count of the properties in Singapore by property type. Steps involved are: - Filter input values to only 2019 - Use ggplot. The dataframe is the input file. - The input variable for this chart is Population - Use geom_histogram(bins=20, color=‘grey30’) to understand the distribution by property type - Use facet_wrap to view the distribution of property type
Secondly, we will generate two vertical bar chart to understand the median population by each property type in Singapore in 2011 and 2019. Steps involved are:
prop<-input%>%
filter(Time=="2019")
prop1<-input%>%
filter(Time=="2011")
ggplot(data=prop,aes(x=Pop))+geom_histogram(bins=20,color='grey30')+facet_wrap(~TOD)+labs(title="Figure 3: Population in Singapore by Housing type, 2019")
#2011
ggplot(data=prop1,aes( x= reorder(TOD,-Pop),y=Pop,color=Time))+stat_summary(geom='bar',color="black",fun='median',fill="lightblue")+labs(title="Figure 4: Population in Singapore by Housing type, 2011",x="Property Type", y="Population", caption="Source: SingStat")
#2019
ggplot(data=prop,aes( x= reorder(TOD,-Pop),y=Pop))+stat_summary(geom='bar',color="black",fun='median',fill="lightblue")+labs(title="Figure 5: Population in Singapore by Housing type, 2019",x="Property Type", y="Population", caption="Source: SingStat")
In figure 4 and 5, it is important to note that HDB has the highest population as of 2019 as indicated by the histogram. Condominiums have the second highest population as of 2019 as indicated by the historgram.
The two vertical bar charts compare the population by property type in 2011 and 2019. Interestingly, the median population in HDB and condominiums has increased in 2019 as compared to 2011. This can be attributed to the rising population in Singapore from 2016 to 2019.
Let’s now study the growth of population in HDBs and Condominiums from 2011 to 2019.
Growth in HDB: - Use dplyr() to filter the dataframe to rows containing HDB in property type column - Use geom_point() to generate a scatter plot for population growth in HDBs from 2011-2019
Growth in Condominiums: - Use dplyr() to filter the dataframe to rows containing Condominiums in property type column - Use geom_point() to generate a scatter plot for population growth in HDBs from 2011-2019
tempor<- input%>%
mutate(Time = as.character(Time))%>%
select(Time, TOD, Pop) %>%
filter(TOD=="HDB")%>%
group_by(Time, TOD) %>%
summarise(pop_sum = sum(Pop))
ggplot(tempor, aes(x=Time, y=pop_sum/1000, color=TOD))+geom_point(position="jitter", size=3)+labs(title="Figure 6: Growth of HDB, 2011-2019", x="Year", y="Population", caption="Source: SingStat")+theme_classic()+theme(plot.title = element_text(hjust = 0.5))+scale_color_manual(values=c("#F13B17"))+theme(plot.caption = element_text(hjust = 1, face = "italic"))
temporar<- input%>%
mutate(Time = as.character(Time))%>%
select(Time, TOD, Pop) %>%
filter(TOD=="Condominiums")%>%
group_by(Time, TOD) %>%
summarise(pop_sum = sum(Pop))
ggplot(temporar, aes(x=Time, y=pop_sum/1000, color=TOD))+geom_point(position="jitter", size=3)+labs(title="Figure 7:Growth of Condominiums, 2011-2019", x="Year", y="Population", caption="Source: SingStat")+theme_classic()+theme(plot.title = element_text(hjust = 0.5))+scale_color_manual(values=c("#4472C4"))+theme(plot.caption = element_text(hjust = 1, face = "italic")+facet_wrap(~TOD))
Initially, we did observe displacement from HDBs to condominiums from 2011 to 2019. Diving deeper, we observe that the growth pattern for HDBs and Condominiums follow an interesting pattern, as evidenced by the scatter plot.
The growth in population of HDB has slowly and steadily increasing from 2011 to 2015. However, the population witnessed a significant drop from 2016 to 2018. The signs of recovery is evident from the numbers in 2019.
In comaprison, the growth in population has been steadily increasing from 2011-2019. The nature of growth indicates an exponential curve from 2017-19, indicating higher growth from 2017.
The Planning Area in Singapore refers to areas such as Yishun, Bedok, Serangoon etc. We want to look at the distribution of population by area and sex in each planning area in Singapore in 2019.
Steps involved are: - Use dplyr() to convert time to string - Groupby Planning Area, Pop, and Sex - Filter dataframe by rows that include only 2019 values - Use geom_point() to generate the chart - Add color as Sex to distinguish distribution of males and females by planning area
#HDB
tempora<- input%>%
mutate(Time = as.character(Time))%>%
select(PA,Time, TOD, Pop,Sex) %>%
group_by(PA, TOD, Time,Sex) %>%
filter(Time=="2019")%>%
filter(TOD=="HDB")%>%
summarise(pop_sum = sum(Pop))
ggplot(data=tempora, aes(x=pop_sum/1000,y=PA,fill=Sex, color= Sex))+stat_summary(geom='point',shape=21,size=2,fun = mean,fill='lightblue')+ theme(axis.text = element_text(size = 4),plot.caption = element_text(hjust = 1, face = "italic"),plot.title = element_text(hjust = 0.5, size=9))+scale_color_manual(values=c("#F13B17","#4472C4"))+labs(x="Population(000s)", y = "Planning Area", fill = "Sex", title = "Figure 8: Singapore Resident Population (HDB) by Sex and Planning Area, 2019", caption="Source: SingStat")
#Condo
tempora1<- input%>%
mutate(Time = as.character(Time))%>%
select(PA,Time, TOD, Pop,Sex) %>%
group_by(PA, TOD, Time,Sex) %>%
#filter(Time=="2019")%>%
filter(TOD=="Condominiums")%>%
summarise(pop_sum = sum(Pop))
ggplot(data=tempora1, aes(x=pop_sum,y=PA,fill=Sex, color= Sex))+stat_summary(geom='point',shape=21,size=2,fun = mean,fill='lightblue')+ theme(axis.text = element_text(size = 4),plot.caption = element_text(hjust = 1, face = "italic"),plot.title = element_text(hjust = 0.5, size=9))+scale_color_manual(values=c("#F13B17","#4472C4"))+labs(x="Population(000s)", y = "Planning Area", fill = "Sex", title = "Figure 8: Singapore Resident Population (Condo) by Sex and Planning Area, 2019", caption="Source: SingStat")
The distribution of male and female population in planning areas are slightly different for HDBs and Condominiums.
HDB Most planning areas in Singapore have equal distribution amongst the male and female population. However, certain planning areas such as Toa Poyah, Sengkang, Hougang, Bukit Merah, and Ang Mo Kio illustrate differences in population distribution amongst males and females.
Condominiums Certain planning areas such as Bukit Timah, Bukit Batok, and Ang Mo Kio has a disparate male and female distribution.
The final visulaization combines the Figures 1-8. Insights and observations can be derived from the following visulaizations:
From the combined data visualization we observe that:
Moreover, this phenomenon is evidenced by Figure 6 and Figure 7. The growth rate for population in condominiums increased consistently from 2011-2019. However the growth rate for the population in HDBs decreased from 2016-2018. There are two insights that can be derived from the same: the purchasing power could have increased during this period, there is a displacement from public housing to private housing during this period.
HDB
However, certain planning areas such as Toa Poyah, Tampines, Sengkang, Hougang, Bukit Merah, and Ang Mo Kio illustrate differences in population distribution amongst males and females. Amongst these planning areas- Tampines, Sengkang, and Ang Mo kio have the highest population.
Condominiums We can compare the male and female population by property type in each planning area in Singapore.As previously discussed, the population residing in condos is lower than people residing in HDBs. The population distribution amongst males and females who reside in most planning areas are equal. However, certain planning areas such as Bukit Timah, Bukit Batok, and Ang Mo Kio has a disparate male and female distribution.