Data Challenges
1.1) Whilst the age group is provided, the age category i.e. young, economically active, aged is not provided across all planning areas. For example, we will not be able to tell whether the population in Ang Mo Kio with age-group of 0-4 belongs to which age category.
1.2) The composition of age group, say percentage of the population belonging to young, aged or economically active groups, is not provided. This makes it difficult to visualise changing composition across the year dimension.
1.3) The year column is not parsed in date-time format when imported into R. This makes it difficult to arrange the year variable in chronological order in order to view the changes to demographic composition sequentially.
Design Challenges
1.4) It is difficult to visualise five separate dimensions - planning area, year, change in demographic composition of young, aged and economically active groups respectively in one static visualisation. Compared to a static visual, an animated visual can remove the dimension of year/time, thus making it easier to visualise four dimensions in one chart/graph.
1.5) The amount of information is large and complex - large because of the sheer number of planning areas (55 in total) and complex because we are showcasing the change in demographic composition across three dimensions. It is difficult to create a visual representation that allows the human mind to visualise and understand this large and complex information easily in order to draw meaningful insights immediately.
1.6) Given that we are measuring the change in three different dimensions i.e. age-groups in a spatial-temporal manner, it is difficult to compare the three dimensions using the same metric. However, this must be done so to ensure that a like-for-like comparison is conducted.
2.1) We will need to programmatically re-code the “AG” column into a new “age-group” column where each age range is given it’s corresponding age-group according to young, aged and economically active.
2.2) We will need to create a new variable that measures the percentage of population within each age group at the planning area and year level. This new variable is calculated by taking the population sum aggregated for each age group divided by the total population sum for that particular planning area and year.
2.3) We will need to recode the year-time into date-time format or convert the year-time into factors so that the variable can be ordered in levels.
2.4) It is recommended to use three separate visualisations to constitute one overall visual; each visualisation contains the changing demographic pattern for each age-group segmented by its respective planning area and temporal-year series.
2.5) It is recommended to color-code the changing demographic pattern so that both spatial and temporal changes can be easily and quickly interpreted by the human eye. Color-coding also helps convey the depth of change by differences in gradient color. This can be done so via a heatmap.
2.6) Instead of comparing absolute population numbers across aged-groups, it might be better to compare percentage numbers instead. For example, it is easier to understand say a change in percentage of aged population in a particular region from 50% to 60%, rather than from 100,000 to 150,000.
My Sketch
knitr::include_graphics("image/sketch.jfif")
Plot the heatmap for each age-group by calling the variable “heatmap”
heatmap_aged
heatmap_young
heatmap_EA
Short description: Heatmap of Singapore’s Changing Demographics by Age-Group Across Planning Areas (2011-2019)
Data Source: https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data
Useful information:
Based on the colour gradient alone, we can observe there is a significant increase (> 20%) in aged population across Outram and Sungei Kadut in the past five years. This will allow policy-makers to delve into a second-order analysis to understand what causes the increase in ageing population there.
The highest population of young are found in Changi, Mandai, Pasir Ris, Punggol, Sembawang, Sengkang, Western Water Catchment and Woodlands. Not surprising, some of these are new estates compared to relatively matured ones.
The economically active population is the largest population in Singapore, where on average most of the color gradient lies in the 60-70% and 70-80% range.
The color gradient change for the economically active population tend to be more stable as compared to the young and aged population. One reason for this is because the age brackets falling into this group tend to be higher.
There are certain planning areas in Singapore where there is relatively no change in demographic structure across all 10 years. For example, Sembawang and Serangoon are two examples of matured estates where the demographic structure remains relatively unchanged.
R is better for creating complex visualisations like heatmaps because the built-in package in ggplot2 is powerful enough to create these visualisations by simply calling the right functions.
R is better for customised visualisations as the arguments in the various options can be tweaked according to one’s preferences. On the other hand, tableau has very limited options to customise a graphic to one’s specific preferences.
R is more easily scalable across multiple large data-sets because the code base is the same and can be easily replicated to accomodate expansion in data size. Whilst the source file for tableau can be changed, the data structure must be in the same format to create the same visualisation - which makes replicating the same visualisation quickly quite inconvenient.