Singapore is one of the rapidly ageing populations in the world as reported in Feb 2020 in the World Economic Forum. https://www.weforum.org/agenda/2020/02/what-are-japan-and-singapore-doing-about-ageing-population/
In this exercise, we will look into the demographic structure of Singapore and draw insightful analysis using various data visualisation charts.
The dataset is sourced from the Singapore Department of Statistics. It details the demographic data of Singapore Residents by Planning Area, Subzone, Age Group, Sex and Type of Dwelling from June 2011 to June 2019.
(https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)
Through the visualisation, we aim to answer the following questions:
There were 883,728 rows of data in the original dataset. It was important to perform data cleaning and summarise the data accordingly to be input into the different charts. After filtering out those records with 0 population count, there are now 257,809 rows of data. The dataset consists of 55 Planning Areas and 323 Subzone areas, 19 Age Groups and 8 Dwelling Type Groupings.There were six different categorical variables and two numeric variables.
Due to the large number of categories in the Planning Area column and the Subzone column, this would cause the visualisation charts to look cluttered and does not allow the story to be told effectively. To better make the demographic comparisons between the areas, the REGION column was retrieved separately and combined with the original dataset. (https://data.gov.sg/dataset/master-plan-2014-planning-area-boundary-web) The grouping of Planning Areas into Regions will help to improve readability and improve contrast across the visualisations.The age groups were also grouped into three distinct groups such as the Young, Economically Active and the Elderly.
This task was quite challenging as it was initially difficult to read from the master-plan-2014-planning-area-boundary-web file as it was a .shp file. The sf package was installed and loaded. A spatial object (sf) was then created accordingly. The sf object also has the geometry details which would come in useful when plotting maps. However, this column was then dropped in the later steps to work with just data frame objects.
The data has to be summarised differently based on the problem statement we were working on. The population was summed based by planning area and by House Type respectively which would be elaborated in the below steps.
The functions mutate() and summarise() were used to create new columns such as Population_Cnt (Sum of population by planning area), % of people living in Landed property etc.
This dataset mainly involved categorical variables. So, I have explored the different possible charts to plot and selected the most effective one. In addition to that, as there are 55 Planning Areas and 323 Subzone areas to consider, static visualisation do bring about some challenges. Hence, decided that it would be better to bring in the Region details to better present the Singapore Demographic Data.
text
In this step, the steps to prepare the various visualisations will be discussed.
Load the packages tidyverse,sf and CGPfunctions.
####Loading packages
packages=c('tidyverse','sf','CGPfunctions')
libraries <- function(packages){
for(package in packages){
#checks if package is installed
if(!require(package, character.only = TRUE)){
#If package does not exist, then it will install
install.packages(package, dependencies = TRUE)
#Loads package
library(package, character.only = TRUE)
}
}
}
libraries(packages)Load the below datasets :
#Loading Dataset
#Set working directory
SG_population <- read.csv("respopagesextod2011to2019.csv")
str(SG_population)## 'data.frame': 883728 obs. of 7 variables:
## $ PA : Factor w/ 55 levels "Ang Mo Kio","Bedok",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ SZ : Factor w/ 323 levels "Admiralty","Airport Road",..: 8 8 8 8 8 8 8 8 8 8 ...
## $ AG : Factor w/ 19 levels "0_to_4","10_to_14",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Sex : Factor w/ 2 levels "Females","Males": 2 2 2 2 2 2 2 2 1 1 ...
## $ TOD : Factor w/ 8 levels "Condominiums and Other Apartments",..: 2 3 4 5 6 7 1 8 2 3 ...
## $ Pop : int 0 10 30 50 0 0 40 0 0 10 ...
## $ Time: int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
#Loading dataset of region names
maplocation <- read_sf("master-plan-2014-subzone-boundary-web/master-plan-2014-subzone-boundary-web-shp/MP14_SUBZONE_WEB_PL.shp")
#Preparing Dataset
#1. Renaming Column Names
names(SG_population)[1:7] <- c("Planning_Area","Sub_Zone","Age_Group","Gender","Dwelling_Type","Population_Cnt","Year")
#sUBZONES , PLANNING AREAS AND REGION ARE IN CAPS SO NEED TO CONVERT IT TO TITLE CASE
maplocation$SUBZONE_N<- str_to_title(maplocation$SUBZONE_N)
maplocation$PLN_AREA_N <- str_to_title(maplocation$PLN_AREA_N)
maplocation$REGION_N <- str_to_title(maplocation$REGION_N)
combineddata <- left_join(maplocation, SG_population, by = c("SUBZONE_N" = "Sub_Zone"))
combineddata <- combineddata %>% dplyr::select(REGION_N, Planning_Area,Age_Group,SUBZONE_N,Gender, Dwelling_Type, Population_Cnt, Year )
combineddata <- st_drop_geometry(combineddata)
combineddata$Age_Group <- str_replace_all(combineddata$Age_Group, "_to_", "-")
combineddata$Age_Group <- as.factor(combineddata$Age_Group)
combineddata$SUBZONE_N <- as.factor(combineddata$SUBZONE_N)
combineddata$REGION_N <- as.factor(combineddata$REGION_N)
#3. Filter those areas with population>0
combineddata <- subset(combineddata,combineddata$Population_Cnt>0)
#4. Extracting data in 2019
combineddata_2019 = subset(combineddata,Year == "2019")
#creating young, econimically active and old subdatasets
young <- combineddata %>%
filter(Age_Group %in% c('0-4','5-9','10-14','15-19','20-24')) %>%
group_by(REGION_N, Year) %>%
summarise(Young = sum(Population_Cnt))
economically_active <- combineddata %>%
filter(Age_Group %in% c('25-29','30-34','35-39','40-44','45-49','50-54','55-59','60-64') )%>%
group_by(REGION_N, Year) %>%
summarise(economically_active = sum(Population_Cnt))
old <- combineddata %>%
filter(Age_Group %in% c('65-69','70-74','75-79','80-84','90_and_over'))%>%
group_by(REGION_N, Year) %>%
summarise(old = sum(Population_Cnt))
#combine the datasets
young_old <- merge(young,old,by=c("REGION_N", "Year"))
total <- merge(young_old,economically_active,by=c("REGION_N", "Year"))
#% young, % economically active , % old
summarised_data <- total %>% group_by(REGION_N, Year) %>%
mutate(`% Young` = (Young/(Young+economically_active+old))*100) %>%
mutate(`% Economically Active`=
(economically_active/(Young+economically_active+old))*100) %>%
mutate(`% old` = (old/(Young+economically_active+old))*100) %>%
mutate(`Total` = (Young+economically_active+old)) %>%
mutate(Old_Age_support_ratio = economically_active/old)
summarised_data_2011_2019 <- summarised_data %>%
filter(Year %in% c('2011','2015','2019'))
summarised_data_2011_2019$Year <- as.character(summarised_data_2011_2019$Year)
slope <- summarised_data_2011_2019 %>%
select(Year, Old_Age_support_ratio, REGION_N) %>%
filter(Year %in% c("2011", "2015", "2019")) %>%
group_by(REGION_N, Year) %>%
arrange(Year)
slope2 <- as.data.frame(slope) %>% mutate(across(is.numeric, ~ round(.,2 ))) A declining Old-Age Support Ratio is observed over the years. Old-Age Support Ratio is defined as the number of people who are capable of providing economic support to the number of elderly people who may be dependent on the support of others’. (https://www.singstat.gov.sg/modules/infographics/old-age-support-ratio) This presents a pressing issue and visualising the old-age support ratio over the years would reflect the trend of ageing population and the actions that would need to be taken to support this group of ageing population.
For the chart below, the CGPfunctions library was loaded and the newggslopegraph function was applied.
The below plot charts the number of
g1 <- combineddata %>%
filter(Year %in% c('2019')) %>%
group_by(Planning_Area, REGION_N,Gender,Year) %>%
summarise(PopCnt = sum(Population_Cnt)/1000) %>%
arrange(Planning_Area)
ggplot(g1, aes(PopCnt, Planning_Area)) +
geom_line(aes(group = Planning_Area)) +
geom_point(aes(color = Gender))+ scale_color_brewer(palette="Dark2") +
scale_fill_manual(values = c("green", "grey", "red")) +
labs(title = "Population Distribution by Planning Area and Gender",x ="Population (In thousands)",subtitle="") +
theme_minimal() +
theme(axis.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
legend.title = element_blank(),
legend.justification = c(0.5, 2),
legend.position = c(.1, 1.075),
legend.direction="horizontal",
plot.title = element_text(size = 20, margin = margin(b = 10)),
plot.subtitle = element_text(size = 10, color = "darkslategrey", margin = margin(b = 25)),
plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0)) In this chart, the percentage of people living in the various household types are plotted. This is to understand the general trend across the regions on the type of housing the residents live in.
combineddata_housing <- combineddata %>% select(-Planning_Area,-SUBZONE_N) %>% group_by(REGION_N, Year) %>% filter(Year=='2019') %>%
mutate(Housing_Type = case_when(
grepl("HDB",Dwelling_Type) ~ "HDB",
grepl("Condominiums",Dwelling_Type) ~ "Condominiums and Other Apartments",
grepl("Landed",Dwelling_Type) ~ "Landed",
grepl("Others",Dwelling_Type) ~ "Others",
grepl("HUDC Flats (excluding those privatised)",Dwelling_Type) ~ "Others"))
percent_wall_type <- combineddata_housing %>%
count(REGION_N, Housing_Type) %>%
group_by(REGION_N) %>%
mutate(percent = n / sum(n) * 100) %>%
ungroup() %>%
arrange(Housing_Type)
percent_wall_type %>%
ggplot(aes(x = REGION_N, y = percent, fill = Housing_Type)) +
scale_fill_brewer(palette = "Pastel2") +
geom_bar(stat = "identity") + coord_flip() + theme_minimal() +
theme(text=element_text(size=20),axis.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
legend.direction="horizontal",
legend.title = element_blank(),
legend.position = "bottom")From the slope chart, we can observe that there is a general trend of the Old-Age support ratio decreasing across the years 2011, 2015 and 2019. This supports the current pressing issue of ageing population. Singapore is one of the rapidly ageing population. Old-Age support ratio is a good measurement of how the number of economically active people supporting each aged person. As of 2019, the Central Region has the lowest old-age suppor ratio of 3.23 compared to the other regions. From this , we can interpret that the Central Region has the highest population of the aged. It is important these factors are considered when upgrading the estate areas such that it is friendly for the senior citizens and more active fitness areas need to be within distance of where these elderly people live.
From the dot plot, we can observe that across all the regions, there is not much difference between the female population and the male population. Areas such as Bedok and Ang Mo Kio see a larger difference between the female and male population.
From the stacked bar chart, we can observe that HDB remains a staple for residents in Singapore. We can observe that the East region has the highest percentage of people living in Landed property compared to the other regions.