Given the requirement to visualise the demographic structure of Singapore population by age cohort and planning area in 2019, some data cleaning is needed to filter out the relevant data. Since the dataset downloaded includes data from previous years 2011 to 2019, filter such that year == “2019”. Additionally, other information such as subzone and type_of_dwelling are not relevant for our visualisation, and thus can be dropped from the dataframe.
In the original dataset, the age range for 5 to 9 years is represented as “5_to_9”, which disrupts the ordering of the age groups in the population pyramid. In order to preserve the order of ascending age groups, change the representation of the age range to “05_to_9”.
There are a total of 55 planning areas in Singapore, which will make the visualisation difficult to interpret because of many overlapping plots. As such, we can categorise the planning areas according to their regions (Central, North, Northeast, East, West) and visualise the demographic structure within the region itself. Information on which region each planning area belongs to can be found here. Each planning area is matched to its respective region.
Similar to planning areas, there are many age groups provided in the dataset. However, this information is less useful to because the age groups have been broken down into small intervals, which does not allow for macro patterns of the demographic structure to be identified. As such, it would be more meaningful to categorise the age groups into 3 broader categories. The following shows the age range that is used for each category:
Young: 0-24 years old
Active: 25-64 years old
Old: > 65 years old
Together with regional data mentioned above, the broader age categories can be used to generate ternary plots which illustrate the overall demographic structure of each planning area, within each region.
2 visualisations are proposed to gain insights about the demographic structure of Singapore
The age-sex pyramid provides a good overview of the distribution of the Singapore population by age cohort, displaying which age groups makes up the largest or smallest proportions of the total population. This information is also further broken down into gender, to show the composition of each age group.
Ternary plots analyse compositional data in the three-dimensional case. Given the three broader age groups that was identified before, ternary plots can be used to display the distribution of young, economically active and old within a planning area. There will be a ternary plot for each region, to prevent overcrowding of the plots on the ternary diagram, so that each point can be read more easily.
Install the following R packages which will be used to read the dataset and to plot the visualisations. The packages are loaded into the R Studio Environment.
tidyverse is allows for data transformation and manipulation
ggthemes for the purpose of changing the themes of the visualisations below
packages = c('tidyverse','ggthemes')
for (p in packages){
if(!require (p, character.only =T)){
install.packages(p)
}
library(p, character.only=T)
}The dataset is used for this visualisation is Singapore Residents by Subzone and Type of Dwelling, 2011-2019 from data.gov.sg. It is called planning-area-subzone-age-group-sex-and-type-of-dwelling-june-2011-2019.csv and is in csv file format.
Read the dataset using the read.csv() function, and print the head() to get a clear overview of what information the dataset provides.
demographic_data <- read.csv("planning-area-subzone-age-group-sex-and-type-of-dwelling-june-2011-2019.csv")
head(demographic_data)## planning_area subzone age_group sex
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## type_of_dwelling resident_count year
## 1 HDB 1- and 2-Room Flats 0 2011
## 2 HDB 3-Room Flats 10 2011
## 3 HDB 4-Room Flats 30 2011
## 4 HDB 5-Room and Executive Flats 50 2011
## 5 HUDC Flats (excluding those privatised) 0 2011
## 6 Landed Properties 0 2011
Filter data relevant only to the year 2019 using the filter() function
Change the naming of the age group “5_to_9” to “05_to_9” using the mutate() function
We can see that the data is now limited to only the year of 2019 under the “year” column.
yr2019 <- demographic_data %>%
filter(year == 2019) %>%
mutate(age_group = gsub("5_to_9", "05_to_9", age_group))
yr2019 <- data.frame(yr2019)
head(yr2019)## planning_area subzone age_group sex
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## type_of_dwelling resident_count year
## 1 HDB 1- and 2-Room Flats 0 2019
## 2 HDB 3-Room Flats 10 2019
## 3 HDB 4-Room Flats 10 2019
## 4 HDB 5-Room and Executive Flats 20 2019
## 5 HUDC Flats (excluding those privatised) 0 2019
## 6 Landed Properties 0 2019
Group the data by age_group and sex since planning area is not relevant in this plot using the groupby()
Summarise the resident_count as population to aggregate the resident count in each sex, within each age group using the summarise() function
Multiply the population numbers for males to reverse the bars by writing an ifelse condition. This ensures that the male population is represented on the left while the female population is represented on the right for the pyramid.
The dataframe is now ready to be used for plotting of the age-sex pyramid.
pyramid_data <- yr2019 %>%
group_by(age_group, sex) %>%
summarise(population=sum(resident_count)) %>%
mutate(pyramid_pop = ifelse(sex=="Males", -population, population))
pyramid_df <- data.frame(pyramid_data)
head(pyramid_df)## age_group sex population pyramid_pop
## 1 0_to_4 Females 90850 90850
## 2 0_to_4 Males 94730 -94730
## 3 05_to_9 Females 97040 97040
## 4 05_to_9 Males 101290 -101290
## 5 10_to_14 Females 102550 102550
## 6 10_to_14 Males 105830 -105830
Use ggplot and geom_bar() to plot the pyramid.
stat="identity" is set in geom_bar because y values are provided, unlike a normal geom_bar(), which by default counts the rows for each x value
Set the scale for the population in thousands using scale_y_continuous and include the corresponding labels
coord_flip is used to change the orientation of the bars to horizontal
pyramid <- ggplot(pyramid_df, aes(x=age_group, y=pyramid_pop, fill=sex)) +
geom_bar(data = subset(pyramid_df, sex == "Females"), stat = "identity") +
geom_bar(data = subset(pyramid_df, sex == "Males"), stat = "identity") +
scale_y_continuous(breaks = seq(-150000, 150000, 50000),
labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))))) +
coord_flip() +
scale_fill_manual(values = c("lightpink2", "steelblue3")) +
ggtitle("Singapore Population Pyramid, 2019") +
xlab("Age Group") +
ylab("Population in thousands") +
theme_fivethirtyeight() +
theme(axis.title = element_text())
pyramidAs mentioned before, there are 55 planning areas, which makes it difficult to visualise the ternary plot because of the overlapping plots.
Filter relevant columns out by dropping the columns subzone, sex, type_of_dwelling and year
Group by planning and age group and summarise resident_count as population to aggregate the resident count in each age group within each planning area
Create a vector for each region, with its corresponding planning areas. The vector for the 5th region (West) is not created because an ifelse() function is used to assign planning areas to that region.
Match the planning areas to their respective regions using the %in% operator and the ifelse() condition.
subset <- subset(yr2019, select= -c(subzone, sex, type_of_dwelling, year)) %>%
group_by(planning_area, age_group) %>%
summarise(population = sum(resident_count))
subset <- data.frame(subset)
Central=c("Bishan", "Bukit Merah","Bukit Timah", "Downtown Core", "Geylang", "Kallang", "Marina East", "Marina South", "Marine Parade", "Museum", "Newton", "Novena", "Orchard", "Outram", "Queenstown", "Dover", "Ghim Moh", "River Valley", "Rochor", "Singapore River", "Southern Islands", "Straits View", "Tanglin", "Toa Payoh")
East=c("Bedok", "Changi", "Changi Bay", "Pasir Ris", "Paya Lebar", "Tampines")
North=c("Central Water Catchment", "Lim Chu Kang", "Mandai", "Sembawang", "Simpang", "Sungei Kadut", "Woodlands", "Yishun")
Northeast=c("Ang Mo Kio", "Hougang", "North-Eastern Islands", "Punggol", "Seletar","Sengkang", "Serangoon")
regional <- mutate(subset,
Region=ifelse(subset$planning_area %in% Central, "Central",
ifelse(subset$planning_area %in% East, "East",
ifelse(subset$planning_area %in% North, "North",
ifelse(subset$planning_area %in% Northeast, "North-east", "West")))))
regional_df <- data.frame(regional)
head(regional_df)## planning_area age_group population Region
## 1 Ang Mo Kio 0_to_4 5420 North-east
## 2 Ang Mo Kio 05_to_9 6230 North-east
## 3 Ang Mo Kio 10_to_14 7380 North-east
## 4 Ang Mo Kio 15_to_19 7930 North-east
## 5 Ang Mo Kio 20_to_24 8920 North-east
## 6 Ang Mo Kio 25_to_29 10620 North-east
We can see that there are 5 unique regions now.
unique(regional_df$Region)## [1] "North-east" "East" "Central" "West" "North"
To create the ternary plot,
Use the spread() function to pivot the dataframe. This makes it easier to aggregate the population for each age category by just selecting the relevant columns of age group
Use the mutate() function to select the relevant age groups for each category to derive the three new measures.
Remove planning areas which have no residential population (TOTAL=0) by using the filter() function
The dataframe is now ready to be used for the ternary plot.
agpop_mutated <- regional_df %>%
spread(age_group, population) %>%
mutate(YOUNG = rowSums(.[3:8]))%>%
mutate(ACTIVE = rowSums(.[9:15])) %>%
mutate(OLD = rowSums(.[16:21])) %>%
mutate(TOTAL = rowSums(.[22:24])) %>%
filter(TOTAL > 0)
agpop_mutated_df <- data.frame(agpop_mutated)
head(agpop_mutated_df)## planning_area Region X0_to_4 X05_to_9 X10_to_14 X15_to_19 X20_to_24
## 1 Ang Mo Kio North-east 5420 6230 7380 7930 8920
## 2 Bedok East 10020 11640 13300 14640 16660
## 3 Bishan Central 2850 3850 4430 4740 5570
## 4 Bukit Batok West 7130 6640 7800 8800 9850
## 5 Bukit Merah Central 6100 6650 6640 6380 6850
## 6 Bukit Panjang West 6700 7230 7680 8500 9570
## X25_to_29 X30_to_34 X35_to_39 X40_to_44 X45_to_49 X50_to_54 X55_to_59
## 1 10620 10510 10940 11760 12570 12170 13090
## 2 19530 17940 18310 20070 21290 20870 22550
## 3 7090 5430 5290 5940 6860 6510 7220
## 4 12510 12480 10600 10690 11680 12010 12450
## 5 9140 10550 11050 11830 11780 10790 11100
## 6 10560 10740 10230 9610 10610 10450 11410
## X60_to_64 X65_to_69 X70_to_74 X75_to_79 X80_to_84 X85_to_89 X90_and_over
## 1 12810 11970 8960 6160 3840 2110 1040
## 2 21830 18810 13660 8300 5600 3130 1820
## 3 7140 5730 3880 2540 1670 970 520
## 4 11590 8560 5020 2930 1820 1020 560
## 5 11270 10370 8310 5990 4190 2220 1390
## 6 9970 6910 4230 2470 1560 820 450
## YOUNG ACTIVE OLD TOTAL
## 1 46500 83850 34080 164430
## 2 85790 142860 51320 279970
## 3 28530 44390 15310 88230
## 4 52730 81500 19910 154140
## 5 41760 78370 32470 152600
## 6 50240 73020 16440 139700
Use the library ggtern to create the overall ternary plot of the Singapore population in 2019.
Load the library ggtern. This is not done together with the code chunk to load other R packages earlier because there are conflicts between the ggtern and tidyverse
Color is set to Region so that each region is represented by a different colour
Size is set to TOTAL such that the size of the point is proportional to the population size in the region
library(ggtern)
overall_ternary <- ggtern(data=agpop_mutated_df, aes(x=YOUNG,y=ACTIVE, z=OLD, color=Region, size=TOTAL)) +
geom_point(alpha=0.5)+
labs(title="Overall Demographic Structure in Singapore by Region, 2019") +
theme_tropical()
overall_ternaryWhile the overall ternary diagram provides an overview of the demographic composition within each region, we can plot individual regional ternary plots to see the demographic composition for each planning area.
mutate() function to filter the data for a specific regionNorth Region
#NORTH region
north <- agpop_mutated %>% filter(Region=="North")
north_tp <- ggtern(data= north, aes(x=YOUNG,y=ACTIVE, z=OLD, color=planning_area, size=TOTAL)) +
geom_point(alpha=0.5)+
labs(title="Demographic Structure in North Region") +
theme_tropical()
north_tpCentral Region
#CENTRAL region
central <- agpop_mutated %>% filter(Region=="Central")
central_tp <- ggtern(data= central, aes(x=YOUNG,y=ACTIVE, z=OLD, color=planning_area, size=TOTAL)) +
geom_point(alpha=0.5)+
labs(title="Demographic Structure in Central Region") +
theme_tropical()
central_tpWest Region
#WEST region
west <- agpop_mutated %>% filter(Region=="West")
west_tp <- ggtern(data= west, aes(x=YOUNG,y=ACTIVE, z=OLD, color=planning_area, size=TOTAL)) +
geom_point(alpha=0.5)+
labs(title="Demographic Structure in West region, 2019") +
theme_tropical()
west_tpNortheast Region
#NORTH-EAST region
northeast <- agpop_mutated %>% filter(Region=="North-east")
northeast_tp <- ggtern(data= northeast, aes(x=YOUNG,y=ACTIVE, z=OLD, color=planning_area, size=TOTAL)) +
geom_point(alpha=0.5)+
labs(title="Demographic Structure in North-east region, 2019") +
theme_tropical()
northeast_tpEast region
#EAST region
east <- agpop_mutated %>% filter(Region=="East")
east_tp <- ggtern(data= east, aes(x=YOUNG,y=ACTIVE, z=OLD, color=planning_area, size=TOTAL)) +
geom_point(alpha=0.5)+
labs(title="Demographic Structure in East region, 2019") +
theme_tropical()
east_tpFrom the age-sex pyramid, we can tell that Singapore has an ageing population, where the highest proportion of the population is economically active, with an increasing proportion belonging to the aged population, and thus reflecting longer life expectancy among Singaporeans. There is a relatively narrow base, which reflects declining birth rates, and thus a smaller proportion of the younger population. Additionally, there is a larger proportion of females among Singaporeans aged 75 and above as compared to males but the proportion of males and females in other age groups are similar.
From the overall ternary plot of the demographic structure in Singapore, we can tell the demographic composition for most regions follow the age-sex pyramid of Singapore, with around 30-35% of the population belonging to the young population, 18-20% of the population belonging to the old population and the remaining 45-55% of the population being economically active. There are however, some exceptions, especially from the Central region (coloured pink). Planning areas belonging to this region seem to have a lower proportion of both the young and old population, and a higher proportion of the economically active compared to the majority of other planning areas.
From the individual regional ternary plots, we can gain insights on Singapore’s demographic geographically as each point represents a planning area within the region. We can identify the planning areas that were mentioned earlier which have a higher proportion of economically active population, such as Marine Parade in the Central region. In the North region, Sungei Kadut has a much higher proportion of the older population (around 21%) as compared to other planning areas within the region (around 15%). In the West region, Western Water Catchment has a much smaller population of the old (1-2%) as compared to other planning areas within the region (around 15-20%). These insights will be useful for policymakers or urban planners, to better take into consideration the needs of the particular demographic within each planning area. For example, for Sungei Kadult, urban planners can then look into the area and examine if eldercare facilities there are adequate to meet the needs of an older population.