This visualization aims to reveal the demographic structure of Singapore population by age cohort and by planning area in 2019.
| Type of Challenge | Description |
|---|---|
| Design Challenge | There are too many age groups to fill into one chart. This will cases the plot become very cluttered. |
| Design Challenge | To present both age group and planning area attributes in one visualization. |
| Data Challenge | Due to the data type of the Age groups, the group “5_to_9” is placed after group “45_to_49” instead of “0_to_4”. |
| Challenge | Solution |
|---|---|
| There are too many age groups to fill into one chart. This will cases the plot become very cluttered. | Regroup the ages into 3 groups. |
| To present both age group and planning area attributes in one visualization. | Use ternary plot to present the age in axes. And plot the circle’s sizes with total population and colour with the region name. |
| Due to the data type of the Age groups, the group “5_to_9” is placed after group “45_to_49” instead of “0_to_4”. | Hard code the column index to specify which column to use while summing up the population. |
The code chunk below will check if the R packages in the packaging list have been installed. if not, install the library. After the installation, it will also load the R packages in R.
packages <- c('rgdal', 'spdep', 'tmap', 'tidyverse', 'prettydoc', 'sf', 'magick', 'plotly')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
pop <- read_csv("../data/aspatial/respopagesextod2011to2020.csv")
mpsz = st_read(dsn = "../data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
The code chunk below are used for:
mpsz_pa_sf <- st_as_sf(mpsz[c("REGION_N", "PLN_AREA_N")])
mpsz_pa_sf <- st_set_crs(mpsz_pa_sf, 3414)
mpsz_pa_sf[rowSums(is.na(mpsz_pa_sf))!=0,]
## Simple feature collection with 0 features and 2 fields
## bbox: xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS: SVY21 / Singapore TM
## [1] REGION_N PLN_AREA_N geometry
## <0 rows> (or 0-length row.names)
mpsz_pa_sf <- st_make_valid(mpsz_pa_sf)
pop_2019 <- pop %>%
filter(Time == 2019) %>%
mutate_at(.vars = vars(PA, SZ),
.funs = funs(toupper))
DT::datatable(
head(pop_2019), extensions = 'FixedColumns',
options =
list(dom = 't',
columnDefs = list(list(width = '100px', targets = c(1, ncol(pop_2019)))),
scrollX = TRUE,
scrollCollapse = TRUE)
)
Using the spread function to convert the age group into columns and the population as the rows.
Mutating new columns Yong, Active, Old, and Total by summing up the values in specific columns.
| Category | Age Group |
|---|---|
| Young | 0 - 24 years old |
| Active | 25 - 64 years old |
| Old | 65 years old and above |
Saving the result to pop_2019_age
pop_2019_sub <- pop_2019 %>%
spread(AG, Pop) %>%
mutate(Young = rowSums(.[6:9])+rowSums(.[15]))%>%
mutate(Active=rowSums(.[10:14])+rowSums(.[16:18])) %>%
mutate(Old = rowSums(.[19:24])) %>%
mutate(Total = rowSums(.[25:27]))
pop_2019_age <- data.frame(pop_2019_sub)
DT::datatable(
head(pop_2019_age), extensions = 'FixedColumns',
options =
list(dom = 't',
columnDefs = list(list(width = '100px', targets = c(1, ncol(pop_2019_age)))),
scrollX = TRUE,
scrollCollapse = TRUE)
)
Create another object mpsz_pop_2019_age by joining the planning area (PA and PLN_AREA_N).
mpsz_pop_2019_age <- left_join(pop_2019_age, mpsz_pa_sf, by= c("PA"="PLN_AREA_N"))
DT::datatable(
head(mpsz_pop_2019_age), extensions = 'FixedColumns',
options =
list(dom = 't',
columnDefs = list(list(width = '100px', targets = c(1, ncol(mpsz_pop_2019_age)))),
scrollX = TRUE,
scrollCollapse = TRUE)
)
Create a function to format the values in each ternary axis.
axis <- function(txt) {
list(
title = txt, tickformat = ".0%", tickfont = list(size = 10)
)
}
Creating a list call ternary_axes to store ternary axes values.
ternary_axes = list(
aaxis = axis("Active"),
baxis = axis("Young"),
caxis = axis("Old")
)
Create the ternary chart.
ternary_chart <- plot_ly(mpsz_pop_2019_age,
a = ~Active,
b = ~Young,
c = ~Old,
color = ~REGION_N,
text = ~PA,
size = ~Total,
marker = list(
line = list(color = 'rgba(0, 0, 0, .5)',
width = 0.5)),
type = "scatterternary"
)
Plot the ternary chart with setting the tile, ternary, and margin.
fig <- ternary_chart %>% layout(
title = list(text = "Demographic Structure of Singapore \nby Age Group and Planning Area in 2019"),
ternary = ternary_axes,
margin = list(l = 0, r = 0, b = 50, t = 130)
)
Use the interactive plot to figure out the findings further.
fig
In this ternary plot, the age groups are now divided into the Young, Active and Old on the 3 axes and the values of the point which corresponds to the 3 axes should add up to 100%. The population in each planning area is presented in the size of the point. And the planning area are grouped into 5 regions Central region, East region, North region, North-east region, west region with different colors.
The 3 main insights gathered from the ternary plot:
Active make up the largest percentage in most of the planning areas in Singapore which shows that majority of Singapore population is age between 25 and 64 years old.Old comparing to the age group Young.West region is clustering at the top-left corner of the plot. This means that the age distribution in most of the planning area under the West region has not much difference.