This visualisation presents the population demographics of Singapore residents. It focuses on the demographic structure of Singapore’s residents by age group and planning area.
The population data set from Singstat covers the period from 2011-2019. It has columns with information of planning area, subzone, gender, age group, type of dwelling and population for the year. The data challenge will be to filter and aggregate the data into different dataframes and matrixes required for each visualisation type.
The design challenge will be to visualize the change of the population structure over the years. There are two main challenges to this. First, the population structure changes slightly each year and this gradual change would not be obvious on the visualisation. Second, the population age structure cannot be be expressed in a single statistic, as the population ages across all age groups. Two visualisations are planned to better visualise this topic. A time series bar chart will show the gradual growth of Singapore’s male and female population over the years. A pair of population pyramids will be prepared to visualise the distribution of people within age groups and compare how they have changed from 2011 to 2019
Another design challenge will be the visualisation of how the population has changed for various planning areas in Singapore overtime, and what is age profile of residents living in these areas. Two visualisations are planned to address this. A divergent stack bar chart will first visualise the population changes in various planning areas in 2019 vs 2011. A heatmap on planning areas will then reveal the age structure of residents in the planning areas.
Install and run tidyverse and plotly.
packages = c('tidyverse','plotly')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
pop_data <- read_csv("data/respopagesextod2011to2019.csv")
Aggregate population by gender and year to create a new dataframe with columns gender, year and population.
popdata1 <- pop_data %>%
mutate(Year = as.character(Time)) %>%
select(Sex,Pop,Year) #%>%
popdata1<- aggregate(popdata1$Pop, by=list(popdata1$Sex, popdata1$Year), FUN=sum)
colnames(popdata1) = c("Sex","Year","Pop")
Use position = “dodge” in geom_bar to plot male/female bars side by side.
ggplot(data=popdata1, aes(x= Year, y = Pop/1000000, fill = Sex)) +
geom_bar(stat="identity",
position = "dodge",
colour = "black"
)+
ggtitle("Resident Population by Year",
subtitle = "Slow increases in resident population")+
scale_x_discrete(name ="YEAR") +
scale_y_continuous(name ="Resident Population in Millions")+
theme_bw()
First, filter data for years 2011 and 2019 and aggregate (sum) population by Age, Gender and Year to create a new dataframe with columns Age, Gender, Year and Population. Then, there is a need to change all population values for Females to negative. This will the female data to be plotted on the left side of the population pyramid. This is done using the ifelse function.
datapy<-subset(pop_data, Time == "2019" | Time =="2011",select=c(-PA,-SZ,-TOD))
colnames(datapy) = c("Age","Sex","Pop","Year")
datapy$Age = ifelse(datapy$Age == "5_to_9", "05_to_9",datapy$Age)
datapy <- aggregate(datapy$Pop, by=list(datapy$Age, datapy$Sex,datapy$Year ), FUN=sum)
colnames(datapy) = c("Age","Sex","Year","Population")
datapy$Pop = ifelse(datapy$Sex == "Females", datapy$Pop*-1,datapy$Pop)
Use facet_grid to plot two pyramid plots for 2011 and 2019.
ggplot(datapy, aes(x=Age,y=Pop, fill=Sex))+
geom_bar(stat = "identity", colour = "black")+
ggtitle('Population Pyramid 2011 vs 2019',
subtitle= "An Ageing Population")+
scale_x_discrete(name ="Age Group") +
scale_y_continuous(breaks = seq(-200000, 200000, 40000),
labels = paste0(as.character(c(seq(0.2, 0, -0.04), seq(0.04, 0.2, 0.04))), "m"),
name ="Population")+
coord_flip()+
facet_grid(rows=vars(Year))
Select and prepare two dataframes, one for 2011 and one for 2019, datapop11 and datapop19 respectively. The columns in each dataframe are planning area and population. Calculate the difference in population between 2011 and 2019 for each planning area and save it to the dataframe which will be used for plotting.
datapop19<-subset(pop_data, Time == "2019",select=c(-AG, -Sex, -Time,-SZ,-TOD))
datapop11<-subset(pop_data, Time == "2011",select=c(-AG, -Sex, -Time,-SZ,-TOD))
colnames(datapop19) = c("PA","Pop")
datapop19 <-aggregate(datapop19$Pop, by=list(datapop19$PA), FUN=sum)
colnames(datapop11) = c("PA","Pop")
datapop11 <-aggregate(datapop11$Pop, by=list(datapop11$PA), FUN=sum)
colnames(datapop11) = c("PA","Pop")
colnames(datapop19) = c("PA","Pop")
datapop19$Pop <- datapop19$Pop - datapop11$Pop
ggplot(data=datapop19, aes(x= reorder(PA,Pop), y = Pop)) +
geom_bar(stat="identity", fill = "lightblue")+
ggtitle('Population Change (2011-2019) by Area',
subtitle ="Punggol and Sengkang are the fastest growing areas")+
scale_x_discrete(name ="Planning Area") +
scale_y_continuous(name ="Population Change")+
coord_flip()+
theme_bw()
Aggregate population data into a new dataframe with columns planning area, age and population. Use spread to create new columns by age group. It is necessary to convert the dataframe to a matrix for plotting a heatmap.
datahm19<-subset(pop_data, Time == "2019",select=c(PA,AG,Pop))
datahm19$AG = ifelse(datahm19$AG == "5_to_9", "05_to_9",datahm19$AG)
datahm19<-aggregate(datahm19$Pop, by=list(datahm19$PA, datahm19$AG), FUN=sum)
colnames(datahm19) = c("PA","AG","Pop")
df<-spread(datahm19, AG, Pop)
df1<- select(df,-PA)
row.names(df1)<-df$PA
matrix <- data.matrix(df1)
It is necessary to scale the data across the rows(planning area). This will allow the differences in population across different areas to be normalised, allowing the relative concentrations of various age groups within each planning area to be better compared.
heatmap <- heatmap(matrix,
scale ="row",
cexRow=0.8,
cexCol=1,
margins = c(6,8),
Rowv=NA,
Colv=NA,
ylab="Area",
xlab="Age Group",
main="Population Heatmap by Area"
)
The time series bar chart shows the male and female Singapore resident population growing slowly from 2011-2019. We should thus expect on average, a small increase in population across all planning areas. This visualisation also reveals that there are slightly more females than males in Singapore.
Both pyramids are fat in the middle and narrow at the top at bottom. The fat center shows that there are most people in the middle age group (40-50 years). Also, the fat center implies that the population is aging, as there are fewer young than old people. Comparing the two pyramids, we notice that the 2019 population pyramid is fatter on top and thinner below compared with the 2011 pyramid. This shows that the aging trend has accelerated from 2011 to 2019.
The following diverging bar chart reveals that the fastest growing towns in the past years were Punggol, Sengkang and Yishun. The towns with fastest shrinking populations are Bedok, Ang Mo Kio and Geylang.
The population heatmap compares and reveals the relative density of various age group across different planning areas. It can be observed that the fastest growing areas (from above) like Punggol have darker colours concentrated on younger age groups(ages 30-39) as compared with shrinking areas like Bedok which have darker colours concentrated on older age groups (ages 45-65). This reveals that young people are mainly the ones moving into the growing areas.