A work by Chen Fangqi

fangqi.chen.2017@smu.edu.sg

1. Overview

In this makeover, I’ll be visualizing the age distribution of Singapore and why it should be a concern for us. The data I will be using comes directly from Singapore’s Official Statistics source: SingStat. The data contains the these key elements: Age Group, Sex and Type of Dwelling from June 2011 to June 2020.

1.1 Purpose of Visualization

Singapore, like many other countries is facing the problem of an aging society. One would ask the question, is Singapore ready for an aging workforce and a rise in the need for old aged care. There would be an increase in the financial needs of the less fortunate who are aging but do not have enough savings to enjoy their retirement.

With an aging society, there is also an increase in the demand for aged care services as noted in a Straits Times article. This visualization could potentially help the aged care businesses to know where they should set up their facilities. For the purpose of this visualization and to fit the data, I have altered the time frame of the various generations. In the table below, you can see how it’s being altered.

Generation	YearBornedOriginal	YearBornedAltered
Generation Alpha	2012 to 2021	After 2012
Generation Z	1997 to 2012	1997 to 2011
Millennials	1981 to 1996	1982 to 1996
Generation X	1965 to 1980	1968 to 1981
Baby Boomers	1946 to 1964	1945 to 1967
Silent Generation	1928 to 1945	Before 1946

These are the current generations’ age ranges:

Generation	YearBornedAltered	Age
Generation Alpha	After 2012	Under 9
Generation Z	1997 to 2011	10 to 24
Millennials	1982 to 1996	25 to 39
Generation X	1968 to 1981	41 to 54
Baby Boomers	1945 to 1967	55 to 74
Silent Generation	Before 1946	Over 75

1.2. Sketch of Proposed DataViz Design

Sketch of Proposed DataViz Design

2. Suggestions

3. DataViz Step by Step

3.1 The installation and loading of packages

formattable is used in the tables above to easily come up with a table using a dataframe
ggplot2 is used to create the various visualizations such as stacked bar charts, Choropleth maps, etc
tidyverse is used for data manipulation
dplyr is used to create pipes of the data as well as read_csv
sf is for map data
packrat and rsconnect are essentials for publishing of R documents

packages = c( 'formattable', 'ggplot2', 'tidyverse', 'dplyr', 'sf', 'packrat', 'rsconnect')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Load the Map Data that is downloaded from SingStat for Geospatial Visualization later

mpsz <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")

3.2 Loading the Data

Importing respopagesextod2011to2020.csv into R. The read_csv() function of readr package will be used.

pop_data <- read_csv("data/respopagesextod2011to2020.csv")

3.3 Data Preparation

3.3.1 Correction of category

As the data when mutated would be ordered by ascending order, this meant that 5_9 would actually be behind rather than infront, so we will have to add a 0 infront to make sure the data is in the right order.

pop_data$AG[pop_data$AG == "5_to_9"] <- '05_to_09'

3.3.2 Assigning a generation based on column AG

This is in accordance to the table in 1.1 This is for the stacked bar chart later in the data visualization portion

pop_data_generation <- pop_data[,c("AG","Sex","Pop", "SZ", "Time", "PA")]
pop_data_generation$Generation <- ifelse(pop_data$AG == "0_to_4" | pop_data$AG == "05_to_09", "GenAlpha", 
                              ifelse(pop_data$AG == "10_to_14" | pop_data$AG == "15_to_19" | pop_data$AG == "20_to_24","GenZ", 
                                     ifelse(pop_data$AG == "25_to_29" | pop_data$AG == "30_to_34" | pop_data$AG == "34_to_39", "Millineals",
                                            ifelse(pop_data$AG == "40_to_44" | pop_data$AG == "45_to_49" | pop_data$AG == "50_to_54", "GenX",
                                                   ifelse(pop_data$AG == "55_to_59" | pop_data$AG == "60_to_64" | pop_data$AG == "65_to_69" | pop_data$AG == "70_to_74", "Boomers", "SilentGeneration")))))
pop_data_generation <- subset(pop_data_generation, Time == '2020')

3.3.3 Sorting into Generations

Now, we will have to mutate the data according to the ages and sort them into their generations

#Deriving the young, economy active and old measures
agpop_mutated <- pop_data %>%
  spread(AG, Pop) %>%
  mutate(GenAlpha = rowSums(.[6:7]))%>%
  mutate(GenZ = rowSums(.[8:10]))  %>%
  mutate(Millineals = rowSums(.[11:13])) %>%
  mutate(GenX = rowSums(.[14:16])) %>%
  mutate(Boomers = rowSums(.[17:20])) %>%
  mutate(SilentGeneration = rowSums(.[21:23])) %>%
  mutate(TOTAL = rowSums(.[24:30])) %>%
  filter(Time == 2020)%>%
  filter(TOTAL > 0)
df2 <- data.frame(agpop_mutated)

3.3.3 Preparing a dataframe for the Age-Gender Pyramid

Filtering the necessary data to create an age gender pyramid for the year 2020

age_gender_df <- subset(pop_data[,c("AG","Sex","Pop", "Time")], Time == '2020')
age_gender_df <- aggregate(formula = Pop ~ Sex + AG, data = age_gender_df, FUN = sum)
age_gender_df <- with(age_gender_df, age_gender_df[order(Sex,AG),])
age_gender_df$Pop <- ifelse(age_gender_df$Sex == "Males", -1*age_gender_df$Pop, age_gender_df$Pop)

3.3.4 Preparing a dataframe for Generation-Gender Pyramid

Filtering the necessary data to create an generation gender pyramid for the year 2020

gen_gender_df <- subset(pop_data_generation[,c("Generation","Sex","Pop", "Time")], Time == '2020')
gen_gender_df <- aggregate(formula = Pop ~ Sex + Generation, data = gen_gender_df, FUN = sum)
gen_gender_df <- with(gen_gender_df, gen_gender_df[order(Sex,Generation),])
gen_gender_df$Pop <- ifelse(gen_gender_df$Sex == "Males", -1*gen_gender_df$Pop, gen_gender_df$Pop)

3.3.5 Data Frames for generation in concern [SilentGeneration, GenX and Boomers] who would be in their 60-90s in 5-10 years time

Create 3 separate dataframes to visualize the top 3 regions that the various generations are staying in

filtered_agpop_silent_generation <- agpop_mutated %>%
  group_by(SZ) %>%
  tally(SilentGeneration) %>%
  top_n(4)

filtered_agpop_boomer <- agpop_mutated %>%
  group_by(SZ) %>%
  tally(Boomers) %>%
  top_n(4)

filtered_agpop_genx <- agpop_mutated %>%
  group_by(SZ) %>%
  tally(GenX) %>%
  top_n(4)

3.3.6 Dataframe for conclusion chart

This dataframes are meant to be for the conclusion, it is a dataframe that totals each generation across time This dataframe follows the following grouping criteria: * Age: 0 to 24 | Students * Age: 25 to 64 | WorkingAdults * Age: 65 and above | Retired

This is in accordance to Singapore’s re-employment maximum age of 68 by 2022 and a minimum retirement age of 62

gen_total_data <- pop_data[,c("AG","Pop","Time")]
gen_total_data$Group <- ifelse(pop_data$AG == "0_to_4" | pop_data$AG == "05_to_09" | pop_data$AG == "10_to_14" | pop_data$AG == "15_to_19" | pop_data$AG == "20_to_24", "Students",
ifelse(pop_data$AG == "25_to_29" | pop_data$AG == "30_to_34" | pop_data$AG == "34_to_39" | pop_data$AG == "40_to_44" | pop_data$AG == "45_to_49" | pop_data$AG == "50_to_54" | pop_data$AG == "55_to_59" | pop_data$AG == "60_to_64", "WorkingAdults", "Retired"))
total_data = aggregate(gen_total_data$Pop, by=list(Category=gen_total_data$Time, gen_total_data$Group), FUN=sum)
colnames(total_data)[1] <- "Time"
colnames(total_data)[2] <- "Group"
colnames(total_data)[3] <- "Pop"

4. Visualization

4.1 Age Distribution using a Gender Pyramid

As visualized below, we can see that the largest concentration are in their 40s to 50s and in 10 years time, they would be in their elderly age but there would not be equivalent 20s to 30s year old to replace these people in their 40s to 50s, this would create a gap in the workforce as there aren’t enough people to replace the 40s to 50s when they are not “Economically Active”

age_gender_pyramid <- ggplot(age_gender_df, aes(x = AG, y = Pop, fill = Sex)) + 
  geom_bar(data = subset(age_gender_df, Sex == "Females"), stat = "identity") +
  geom_bar(data = subset(age_gender_df, Sex == "Males"), stat = "identity") + 
  xlab("Age") + ylab("Population in millions") + 
  scale_y_continuous(labels = paste0(as.character(c(seq(2, 0, -1), seq(1, 2, 1))), "m")) + 
  coord_flip()
age_gender_pyramid + theme_classic() + labs(title = "Age Gender Distribution of Singapore 2020",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

4.2 Generation Distribution using a Gender Pyramid

As mentioned above, this chart further emphasizes the disparity in population, when we compare the Boomers with GenX and even GenZ, there’s a huge difference in the population size.

generation_gender_pyramid <- ggplot(gen_gender_df, aes(x = reorder(Generation,Pop), y = Pop, fill = Sex)) + 
  geom_bar(data = subset(gen_gender_df, Sex == "Females"), stat = "identity") +
  geom_bar(data = subset(gen_gender_df, Sex == "Males"), stat = "identity") + 
  xlab("Age") + ylab("Population in millions") + 
  scale_y_continuous(labels = paste0(as.character(c(seq(2, 0, -1), seq(1, 2, 1))), "m")) + 
  coord_flip()
generation_gender_pyramid + theme_classic() + labs(title = "Generation Gender Distribution of Singapore 2020 [Sorted by Population of Generation]",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

4.3 Geospatial Visualization

4.3.1 Silent Generation

As seen in the visualization below, these are the top 4 Subzones with the population of people from the Silent Generation * Tampines East * Tampines West * Bedok North * Bedok South

viz <- ggplot(filtered_viz_silent_gen) + 
  geom_sf(aes(fill=Population), size = .05) + geom_sf_label(aes(label = SUBZONE_N))

viz + theme_void() + labs(title = "Top 4 places the Silent Generations are staying in",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

4.3.2 Boomers

As seen in the visualization below, these are the top 4 Subzones with the population of people from the Boomers generation * Tampines East * Tampines West * Woodlands East * Bedok North

viz <- ggplot(filtered_viz_boomer) + 
  geom_sf(aes(fill=Population), size = .05) + geom_sf_label(aes(label = SUBZONE_N))

viz + theme_void() + labs(title = "Top 4 places the Boomers are staying in",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

4.3.3 Generation X

As seen in the visualization below, these are the top 4 Subzones with the population of people from Generation X * Tampines East * Jurong West Central * Woodlands East * Bedok North

viz <- ggplot(filtered_viz_genx) + 
  geom_sf(aes(fill=Population), size = .05) + geom_sf_label(aes(label = SUBZONE_N))

viz + theme_void() + labs(title = "Top 4 places the Generation X are staying in",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

4.3.4 Stacked Horizontal Bar Chart to show Generation Distribution

This chart omits those Subzones with less than 1000 people living in it It shows the composition of generations living in the Subzones. As seen in the visualization below, Tampines East and Woodlands East are Populated with all 3 aging generation

pop_data_generation <- subset(pop_data_generation, Pop > 1000)
pop_data_generation <- na.omit(pop_data_generation)
ggplot(pop_data_generation, aes(fill=Generation, x=Pop, y=reorder(SZ, Pop))) + 
    geom_bar(position="stack", stat="identity") + theme_classic() + xlab("Population") + ylab("Planning Area") + labs(title = "Stacked Bar Chart for Subzone-Generation Distribution",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

5. Conclusion

As the population gets older, the healthcare expenses of the older citizens would increase and the government should plan early especially with a declining birth rate. If the amount of working adults crosses over with the amount of people who are retired, we would not have enough people who are working/economically active.

p <- ggplot(data=subset(total_data, Group == "Retired" | Group == "WorkingAdults"), aes(fill=Group, x=Time, y=Pop, color=Group)) +
  geom_line(size = 1) +
  geom_point() + scale_y_continuous(label=comma) + scale_x_continuous(breaks = c(2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020)) + theme_classic()
p + labs(title = "Working Adults vs Retired Group 2011-2020",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

Moreover, with a declining birth rate, in the next few decades, as noted in the total population in the Students group, there would not be sufficient people to replace the current “Working Adults” group.

p <- ggplot(data=subset(total_data, Group == "Students" | Group == "WorkingAdults"), aes(x=Time, y=Pop, color=Group)) +
  geom_line(size = 1) +
  geom_point() + scale_y_continuous(label=comma) + scale_x_continuous(breaks = c(2011, 2015, 2020)) + theme_classic()
p + facet_wrap(~Group) + labs(title = "Students vs Working Adults Group 2011-2020",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))

The government should focus on getting the aged care services businesses to open in areas with a high concentration of aging adults such as:

Tampines
Woodlands
Bedok

With all this being said, we look at things from the brighter side. Automation and AI might eventually help us reduce the amount of people required in the workforce, which could potentially solve the issue of an aging workforce as noted in this article

The reality of an ageing society.

IS428 Visual Analytics for Business Intelligence | DataViz Makeover 8

Chen Fangqi

22 March 2021