To reveal the demographic structure of Singapore population by Age Cohort and by Planning Area in 2019.
Sketch
This code installs and load the neccessary packages required for the analysis below.
packages <- c("tidyverse","reshape2")
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
In this exercise, we will be using the data set of the population data of Singapore from Year 2000 to 2019. the data set can be downloaded here. We downloaded the 2 csv files in this exercise.
In the code chunk below, read_csv() of readr is used to import both the population csv files, respopagesextod2000to2010.csv and respopagesextod2011to2019.csv into R and parsed it into tibble R data frame format.
pop2000 <- read_csv("Data/respopagesextod2000to2010.csv")
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
pop2011 <- read_csv("Data/respopagesextod2011to2019.csv")
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
In the code chunk below, we concatenate the files to a master data frame.
popraw <- rbind(pop2000, pop2011)
In the code chunk below, we determine the order of age group.
desired_order <- c("0_to_4",
"5_to_9",
"10_to_14",
"15_to_19",
"20_to_24",
"25_to_29",
"30_to_34",
"35_to_39",
"40_to_44",
"45_to_49",
"50_to_54",
"55_to_59",
"60_to_64",
"65_to_69",
"70_to_74",
"75_to_79",
"80_to_84",
"85_to_89",
"90_and_over")
popraw$AG <- factor(popraw$AG, levels=desired_order)
In the code chunk below, we filter only data from 2019.
pop2019 <- popraw[popraw$Time == "2019",]
In the code chunk below, we look at the total population of each planning area in 2019.
total2019 <- pop2019 %>%
group_by(PA) %>%
summarise(sum(Pop))
In the code chunk below, we remove planning areas that has population of less than 5000. We then further split them into below and above 10000.
residential2019 <- total2019[total2019$`sum(Pop)`> 5000, ]
PA2019 <- residential2019$PA
residential2019above <- residential2019[residential2019$`sum(Pop)` >= 100000, ]
PA2019above <- residential2019above$PA
residential2019below <- residential2019[residential2019$`sum(Pop)` < 100000, ]
PA2019below <- residential2019below$PA
In the code chunk below, we remove the Non-Residential Areas for our data.
pop2019res <- pop2019 %>%
filter(PA %in% PA2019)
In the code chunk below, we split the “Sex” column into 2 new columns, “Males” and “Females”. Convert the Males value to negative to plot Age-Sex Pyramid.
pop2019asp <- pivot_wider(pop2019res, names_from = Sex, values_from = Pop)
pop2019asp$Males <- -1 * pop2019asp$Males
pop2019asp <- pivot_longer(pop2019asp, cols = c("Males", "Females"), names_to = "Sex", values_to = "Pop")
From the visualization below, the number of young (0 - 25) are decreasing. This show the characteristics of an aging population.
ggplot(pop2019asp, aes(x = AG, y = Pop, fill = Sex)) +
geom_col() +
labs(title = "Singapore Age Sex Pyramid 2019", x = "Age Group", y = "Population") +
coord_flip() +
scale_fill_brewer(palette = "Set1") + scale_y_continuous(breaks = seq(-150000, 150000, 50000),
labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))), "k")) +
theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
legend.title = element_blank(),
legend.text = element_text(face = "italic", colour="steelblue4"),
axis.title = element_text(size = (10), colour = "steelblue4"),
axis.text = element_text(size = (10), colour = "steelblue4"),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=1))
The visualizations below show the residential areas as compared to areas with similar population sizes. From the visualization by planning area, there are several areas with relatively smaller population. We will overcome this by splitting the planning areas using a cut-off of 10,000.
ggplot(pop2019asp, aes(x = AG, y = Pop, fill = Sex)) +
geom_col() +
labs(title = "Singapore Age Sex Pyramid 2019 by Planning Area", x = "Age Group", y = "Population") +
coord_flip() +
scale_fill_brewer(palette = "Set1") +
theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
legend.title = element_blank(),
legend.text = element_text(face = "italic", colour="steelblue4"),
axis.title = element_text(size = (10), colour = "steelblue4"),
axis.text = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=1)) +
facet_wrap(PA ~ .)
The visualizations below shows the residential areas as compared to areas with similiar population sizes.
The visualizations below show the residential areas as compared to areas with similar population sizes. From the visualization, planning area like Ang Mo Kio and Bedok are experiencing aging population as the number of older people are higher that the active and young. Planning areas like Punggol and Sengkang possess the characteristic of new town. New town are areas which are newly developed. As a result, most of the residents are middle age, most probably they are newlywed. From this visualization, we can guess the number of years where the area was open for residency. For example, Chua Chu Kang, Pasir Ris and Tampines are developed at around the same time period as the mode of the pyramids are similar.
The visualization of the lower population residence planning area, we can see that areas like Newton, Novena, Outran, River Valley, Rochor and Tanglin are having very low population. This is due to their proximity with the central business district, as the land is used for mostly business purposes as a result, we see a lower population.
pop2019aspabove <- pop2019asp %>%
filter(PA %in% PA2019above)
ggplot(pop2019aspabove, aes(x = AG, y = Pop, fill = Sex)) +
geom_col() +
labs(title = "Singapore Age Sex Pyramid 2019 by Planning Area \nPopulation >10000", x = "Age Group", y = "Population") +
coord_flip() +
scale_fill_brewer(palette = "Set1") +
theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
legend.title = element_blank(),
legend.text = element_text(face = "italic", colour="steelblue4"),
axis.title = element_text(size = (10), colour = "steelblue4"),
axis.text = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=1)) +
facet_wrap(PA ~ .)
pop2019aspbelow <- pop2019asp %>%
filter(PA %in% PA2019below)
ggplot(pop2019aspbelow, aes(x = AG, y = Pop, fill = Sex)) +
geom_col() +
labs(title = "Singapore Age Sex Pyramid 2019 by Planning Area\nPopulation < 10000", x = "Age Group", y = "Population") +
coord_flip() +
scale_fill_brewer(palette = "Set1") +
theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
legend.title = element_blank(),
legend.text = element_text(face = "italic", colour="steelblue4"),
axis.title = element_text(size = (10), colour = "steelblue4"),
axis.text = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=1)) +
facet_wrap(PA ~ .)