Singapore Resident Population 2011-19

Singapore’s demographic is multi-ethinic in nature. One of the key indicators of Singapore’s economic success is the inclusiveness of public housing.

Through this data visualization exercise:

  1. We want to study the population growth rate across gender, dwelling type, and area from the years of 2011-19.
  2. Understand the difference between public and private housing
  3. Public and Private housing characteristics in different planning areas in Singapore

This document provides the R code that is executable, and the packages used in this exercise are dplyr, tidyverse, and ggplot2

Rpubs link: https://rpubs.com/sashankramesh/642489

Instructions: In case you want to run the file locally, set the directory using the provision provided in line #84.

Data and Design Challenges

Population growth in Singapore

Population growth in Singapore

Population growth in Singapore by Housing Type

Population growth in Singapore by Housing Type

Population growth in Singapore by Planning Area

Population growth in Singapore by Planning Area

The source of the data was https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data. Before diving deeper into the visualization, there were data and design challenges that we needed to rectify. First, let’s look at the data challenges:

  1. Too many categories in dwelling type

Initially, the dataset contained categories for different type of HDB properties. Upon analysis, we grouped all the types of HDB properties into HDB, and the other property types into Condominiums, HUDC properties, Landed Properties, and the rest that follow into others.

The rationale behind grouping the properties was one of simplification.

  1. Remove rows with ‘population=0’

The ‘Pop’ column has values of ‘0’ indicating that the number of residents for that specific category is 0. To account for the same, we drop the rows that have population that is 0.

Hence, the dataset will contain sub-groups, dwelling types, age groups where the population is greater than 0.

  1. Age categories

For our analysis, we will not be using age groups to visualize for story-telling. The age groups are not necessarily in the right format for data consumption. Since we will not be using the age group for our visualization, we will not change the format of the age groups.

Install Packages and Load File

The packages that we need for our data visualization process is:

  1. Tidyverse
  2. ggplot2
  3. dplyr

The input file is downloaded from Singstat. The data describes the dwelling type, age group, sex, planning area, and year. As previously mentioned, there are records with no population. Hence, we drop the rows that have no population

input<- read.csv("respopagesextod2011to2019.csv")
input<-input[!(input$Pop=="0"),]

Population Growth 2011-19

The growth in population is important to understand with respect to the Singapore context. Through this study, we will look at growth of population in absolute terms, but also in comparison with the other dimensions in the dataset. Firstly, let’s look at figure 1- this is the absolute growth of population in Singapore from 2011-19.

As is evident, we use the dplyr() function to manipulate the data. Here are the steps below to generate the charts: a) Time is reognised as an integer. We convert the datatype to character to represent discrete values. b) We filter the column by seleting three columns:- Time, property type (TOD) , and Pop c) Groupby time and property type d) Generate the sum of population to be represent the growth across years

tempo<- input%>%
  mutate(Time = as.character(Time))%>%
  select(Time, TOD, Pop) %>%
  group_by(Time, TOD) %>%
  summarise(pop_sum = sum(Pop))

ggplot(tempo, aes(x=Time, y=pop_sum, color=TOD))+stat_summary(geom='bar',color="black",fun='mean',fill="lightblue")+labs(title="Figure 1: Population in Singapore, 2011-2019", x="Year", y="Population",  caption="Source: SingStat")

In Figure 1, we observe that the population remains fairly constant from 2011 to 2016. However, the population sees significant increase from 2016 to 2019. This population refers to the group of people who are residents in Singapore.

Population Growth by Gender

Now let’s look at the growth of resident population of Singapore by Gender. The population in the chart below is represented in 000s. The steps followed to generate the chart are as follows:

As is evident, we use the dplyr() function to manipulate the data. Here are the steps below to generate the charts: a) Time is reognised as an integer. We convert the datatype to character to represent discrete values. b) We filter the column by seleting three columns:- Time, Sex , and Pop c) Groupby time and Sex d) Generate the sum of population to be represent the growth across years

temp<- input%>%
  mutate(Time = as.character(Time))%>%
  select(Time, Sex, Pop) %>%
  group_by(Time, Sex) %>%
  summarise(pop_sum = sum(Pop))

ggplot(temp, aes(x=Time, y=pop_sum/1000, color=Sex))+geom_point(position="jitter", size=3)+labs(title="Figure 2: Population in Singapore, 2011-2019", x="Year", y="Population",  caption="Source: SingStat")+theme_classic()+theme(plot.title = element_text(hjust = 0.5))+scale_color_manual(values=c("#F13B17", "#4472C4"))+theme(plot.caption = element_text(hjust = 1, face = "italic"))

Figure 2:

The population is represented in 000s. There is a different in the distribution of males and females in the resident population.The total number of females in each year from 2011 to 2019 are greater than males in the same time period. However, the growth rate of the femlae population and the male population is similar.

Population Growth by Housing Type

We want to understand the type of properties and the population growth by population type.

Firstly, we will look at the count of the properties in Singapore by property type. Steps involved are: - Filter input values to only 2019 - Use ggplot. The dataframe is the input file. - The input variable for this chart is Population - Use geom_histogram(bins=20, color=‘grey30’) to understand the distribution by property type - Use facet_wrap to view the distribution of property type

Secondly, we will generate two vertical bar chart to understand the median population by each property type in Singapore in 2011 and 2019. Steps involved are:

prop<-input%>%
  filter(Time=="2019")
prop1<-input%>%
  filter(Time=="2011")

ggplot(data=prop,aes(x=Pop))+geom_histogram(bins=20,color='grey30')+facet_wrap(~TOD)+labs(title="Figure 3: Population in Singapore by Housing type, 2019")

#2011
ggplot(data=prop1,aes( x= reorder(TOD,-Pop),y=Pop,color=Time))+stat_summary(geom='bar',color="black",fun='median',fill="lightblue")+labs(title="Figure 4: Population in Singapore by Housing type, 2011",x="Property Type", y="Population",  caption="Source: SingStat")

#2019
ggplot(data=prop,aes( x= reorder(TOD,-Pop),y=Pop))+stat_summary(geom='bar',color="black",fun='median',fill="lightblue")+labs(title="Figure 5: Population in Singapore by Housing type, 2019",x="Property Type", y="Population",  caption="Source: SingStat")

In figure 4 and 5, it is important to note that HDB has the highest population as of 2019 as indicated by the histogram. Condominiums have the second highest population as of 2019 as indicated by the historgram.

The two vertical bar charts compare the population by property type in 2011 and 2019. Interestingly, the median population in HDB and condominiums has increased in 2019 as compared to 2011. This can be attributed to the rising population in Singapore from 2016 to 2019.

HDB vs Condominiums

Let’s now study the growth of population in HDBs and Condominiums from 2011 to 2019.

Growth in HDB: - Use dplyr() to filter the dataframe to rows containing HDB in property type column - Use geom_point() to generate a scatter plot for population growth in HDBs from 2011-2019

Growth in Condominiums: - Use dplyr() to filter the dataframe to rows containing Condominiums in property type column - Use geom_point() to generate a scatter plot for population growth in HDBs from 2011-2019

tempor<- input%>%
  mutate(Time = as.character(Time))%>%
  select(Time, TOD, Pop) %>%
  filter(TOD=="HDB")%>%
  group_by(Time, TOD) %>%
  summarise(pop_sum = sum(Pop))

ggplot(tempor, aes(x=Time, y=pop_sum/1000, color=TOD))+geom_point(position="jitter", size=3)+labs(title="Figure 6: Growth of HDB, 2011-2019", x="Year", y="Population",  caption="Source: SingStat")+theme_classic()+theme(plot.title = element_text(hjust = 0.5))+scale_color_manual(values=c("#F13B17"))+theme(plot.caption = element_text(hjust = 1, face = "italic"))

temporar<- input%>%
  mutate(Time = as.character(Time))%>%
  select(Time, TOD, Pop) %>%
  filter(TOD=="Condominiums")%>%
  group_by(Time, TOD) %>%
  summarise(pop_sum = sum(Pop))

ggplot(temporar, aes(x=Time, y=pop_sum/1000, color=TOD))+geom_point(position="jitter", size=3)+labs(title="Figure 7:Growth of Condominiums, 2011-2019", x="Year", y="Population",  caption="Source: SingStat")+theme_classic()+theme(plot.title = element_text(hjust = 0.5))+scale_color_manual(values=c("#4472C4"))+theme(plot.caption = element_text(hjust = 1, face = "italic")+facet_wrap(~TOD)) 

Initially, we did observe displacement from HDBs to condominiums from 2011 to 2019. Diving deeper, we observe that the growth pattern for HDBs and Condominiums follow an interesting pattern, as evidenced by the scatter plot.

The growth in population of HDB has slowly and steadily increasing from 2011 to 2015. However, the population witnessed a significant drop from 2016 to 2018. The signs of recovery is evident from the numbers in 2019.

In comaprison, the growth in population has been steadily increasing from 2011-2019. The nature of growth indicates an exponential curve from 2017-19, indicating higher growth from 2017.

Population Growth by Planning Area

The Planning Area in Singapore refers to areas such as Yishun, Bedok, Serangoon etc. We want to look at the distribution of population by area and sex in each planning area in Singapore in 2019.

Steps involved are: - Use dplyr() to convert time to string - Groupby Planning Area, Pop, and Sex - Filter dataframe by rows that include only 2019 values - Use geom_point() to generate the chart - Add color as Sex to distinguish distribution of males and females by planning area

#HDB
tempora<- input%>%
  mutate(Time = as.character(Time))%>%
  select(PA,Time, TOD, Pop,Sex) %>%
  group_by(PA, TOD, Time,Sex) %>%
  filter(Time=="2019")%>%
  filter(TOD=="HDB")%>%
  summarise(pop_sum = sum(Pop))



ggplot(data=tempora, aes(x=pop_sum/1000,y=PA,fill=Sex, color= Sex))+stat_summary(geom='point',shape=21,size=2,fun = mean,fill='lightblue')+  theme(axis.text = element_text(size = 4),plot.caption = element_text(hjust = 1, face = "italic"),plot.title = element_text(hjust = 0.5, size=9))+scale_color_manual(values=c("#F13B17","#4472C4"))+labs(x="Population(000s)", y = "Planning Area", fill = "Sex", title = "Figure 8: Singapore Resident Population (HDB) by Sex and Planning Area, 2019", caption="Source: SingStat")

#Condo
tempora1<- input%>%
  mutate(Time = as.character(Time))%>%
  select(PA,Time, TOD, Pop,Sex) %>%
  group_by(PA, TOD, Time,Sex) %>%
  #filter(Time=="2019")%>%
  filter(TOD=="Condominiums")%>%
  summarise(pop_sum = sum(Pop))


ggplot(data=tempora1, aes(x=pop_sum,y=PA,fill=Sex, color= Sex))+stat_summary(geom='point',shape=21,size=2,fun = mean,fill='lightblue')+  theme(axis.text = element_text(size = 4),plot.caption = element_text(hjust = 1, face = "italic"),plot.title = element_text(hjust = 0.5, size=9))+scale_color_manual(values=c("#F13B17","#4472C4"))+labs(x="Population(000s)", y = "Planning Area", fill = "Sex", title = "Figure 8: Singapore Resident Population (Condo) by Sex and Planning Area, 2019", caption="Source: SingStat")

The distribution of male and female population in planning areas are slightly different for HDBs and Condominiums.

HDB Most planning areas in Singapore have equal distribution amongst the male and female population. However, certain planning areas such as Toa Poyah, Sengkang, Hougang, Bukit Merah, and Ang Mo Kio illustrate differences in population distribution amongst males and females.

Condominiums Certain planning areas such as Bukit Timah, Bukit Batok, and Ang Mo Kio has a disparate male and female distribution.

Data Viz

The final visulaization combines the Figures 1-8. Insights and observations can be derived from the following visulaizations:

Insights

From the combined data visualization we observe that:

  1. From 2011-19, the popularity of condominiums have increased indicating a higher purchasing power in the Singapore community. This is evidenced by two factors. Firstly, the median population for HDB occupancy increased from 2011 to 2019, and we witnessed an increase in Condominium occupancy, which can be attributed to the population increase from 2011 to 2019.

Moreover, this phenomenon is evidenced by Figure 6 and Figure 7. The growth rate for population in condominiums increased consistently from 2011-2019. However the growth rate for the population in HDBs decreased from 2016-2018. There are two insights that can be derived from the same: the purchasing power could have increased during this period, there is a displacement from public housing to private housing during this period.

  1. Interstingly, the population distribution amongst males and females who reside in most planning areas are equal.We must look at the population as a comparison between those that reside in HDBs and those that reside in Condominiums

HDB

However, certain planning areas such as Toa Poyah, Tampines, Sengkang, Hougang, Bukit Merah, and Ang Mo Kio illustrate differences in population distribution amongst males and females. Amongst these planning areas- Tampines, Sengkang, and Ang Mo kio have the highest population.

Condominiums We can compare the male and female population by property type in each planning area in Singapore.As previously discussed, the population residing in condos is lower than people residing in HDBs. The population distribution amongst males and females who reside in most planning areas are equal. However, certain planning areas such as Bukit Timah, Bukit Batok, and Ang Mo Kio has a disparate male and female distribution.