A work by Chen Fangqi
fangqi.chen.2017@smu.edu.sg
In this makeover, I’ll be visualizing the age distribution of Singapore and why it should be a concern for us. The data I will be using comes directly from Singapore’s Official Statistics source: SingStat. The data contains the these key elements: Age Group, Sex and Type of Dwelling from June 2011 to June 2020.
Singapore, like many other countries is facing the problem of an aging society. One would ask the question, is Singapore ready for an aging workforce and a rise in the need for old aged care. There would be an increase in the financial needs of the less fortunate who are aging but do not have enough savings to enjoy their retirement.
With an aging society, there is also an increase in the demand for aged care services as noted in a Straits Times article. This visualization could potentially help the aged care businesses to know where they should set up their facilities. For the purpose of this visualization and to fit the data, I have altered the time frame of the various generations. In the table below, you can see how it’s being altered.
Generation | YearBornedOriginal | YearBornedAltered |
---|---|---|
Generation Alpha | 2012 to 2021 | After 2012 |
Generation Z | 1997 to 2012 | 1997 to 2011 |
Millennials | 1981 to 1996 | 1982 to 1996 |
Generation X | 1965 to 1980 | 1968 to 1981 |
Baby Boomers | 1946 to 1964 | 1945 to 1967 |
Silent Generation | 1928 to 1945 | Before 1946 |
These are the current generations’ age ranges:
Generation | YearBornedAltered | Age |
---|---|---|
Generation Alpha | After 2012 | Under 9 |
Generation Z | 1997 to 2011 | 10 to 24 |
Millennials | 1982 to 1996 | 25 to 39 |
Generation X | 1968 to 1981 | 41 to 54 |
Baby Boomers | 1945 to 1967 | 55 to 74 |
Silent Generation | Before 1946 | Over 75 |
![]()
Sketch of Proposed DataViz Design
packages = c( 'formattable', 'ggplot2', 'tidyverse', 'dplyr', 'sf', 'packrat', 'rsconnect')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Load the Map Data that is downloaded from SingStat for Geospatial Visualization later
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
Importing respopagesextod2011to2020.csv into R. The read_csv() function of readr package will be used.
pop_data <- read_csv("data/respopagesextod2011to2020.csv")
As the data when mutated would be ordered by ascending order, this meant that 5_9 would actually be behind rather than infront, so we will have to add a 0 infront to make sure the data is in the right order.
pop_data$AG[pop_data$AG == "5_to_9"] <- '05_to_09'
This is in accordance to the table in 1.1 This is for the stacked bar chart later in the data visualization portion
pop_data_generation <- pop_data[,c("AG","Sex","Pop", "SZ", "Time", "PA")]
pop_data_generation$Generation <- ifelse(pop_data$AG == "0_to_4" | pop_data$AG == "05_to_09", "GenAlpha",
ifelse(pop_data$AG == "10_to_14" | pop_data$AG == "15_to_19" | pop_data$AG == "20_to_24","GenZ",
ifelse(pop_data$AG == "25_to_29" | pop_data$AG == "30_to_34" | pop_data$AG == "34_to_39", "Millineals",
ifelse(pop_data$AG == "40_to_44" | pop_data$AG == "45_to_49" | pop_data$AG == "50_to_54", "GenX",
ifelse(pop_data$AG == "55_to_59" | pop_data$AG == "60_to_64" | pop_data$AG == "65_to_69" | pop_data$AG == "70_to_74", "Boomers", "SilentGeneration")))))
pop_data_generation <- subset(pop_data_generation, Time == '2020')
Now, we will have to mutate the data according to the ages and sort them into their generations
#Deriving the young, economy active and old measures
agpop_mutated <- pop_data %>%
spread(AG, Pop) %>%
mutate(GenAlpha = rowSums(.[6:7]))%>%
mutate(GenZ = rowSums(.[8:10])) %>%
mutate(Millineals = rowSums(.[11:13])) %>%
mutate(GenX = rowSums(.[14:16])) %>%
mutate(Boomers = rowSums(.[17:20])) %>%
mutate(SilentGeneration = rowSums(.[21:23])) %>%
mutate(TOTAL = rowSums(.[24:30])) %>%
filter(Time == 2020)%>%
filter(TOTAL > 0)
df2 <- data.frame(agpop_mutated)
Filtering the necessary data to create an age gender pyramid for the year 2020
age_gender_df <- subset(pop_data[,c("AG","Sex","Pop", "Time")], Time == '2020')
age_gender_df <- aggregate(formula = Pop ~ Sex + AG, data = age_gender_df, FUN = sum)
age_gender_df <- with(age_gender_df, age_gender_df[order(Sex,AG),])
age_gender_df$Pop <- ifelse(age_gender_df$Sex == "Males", -1*age_gender_df$Pop, age_gender_df$Pop)
Filtering the necessary data to create an generation gender pyramid for the year 2020
gen_gender_df <- subset(pop_data_generation[,c("Generation","Sex","Pop", "Time")], Time == '2020')
gen_gender_df <- aggregate(formula = Pop ~ Sex + Generation, data = gen_gender_df, FUN = sum)
gen_gender_df <- with(gen_gender_df, gen_gender_df[order(Sex,Generation),])
gen_gender_df$Pop <- ifelse(gen_gender_df$Sex == "Males", -1*gen_gender_df$Pop, gen_gender_df$Pop)
Create 3 separate dataframes to visualize the top 3 regions that the various generations are staying in
filtered_agpop_silent_generation <- agpop_mutated %>%
group_by(SZ) %>%
tally(SilentGeneration) %>%
top_n(4)
filtered_agpop_boomer <- agpop_mutated %>%
group_by(SZ) %>%
tally(Boomers) %>%
top_n(4)
filtered_agpop_genx <- agpop_mutated %>%
group_by(SZ) %>%
tally(GenX) %>%
top_n(4)
This dataframes are meant to be for the conclusion, it is a dataframe that totals each generation across time This dataframe follows the following grouping criteria: * Age: 0 to 24 | Students * Age: 25 to 64 | WorkingAdults * Age: 65 and above | Retired
This is in accordance to Singapore’s re-employment maximum age of 68 by 2022 and a minimum retirement age of 62
gen_total_data <- pop_data[,c("AG","Pop","Time")]
gen_total_data$Group <- ifelse(pop_data$AG == "0_to_4" | pop_data$AG == "05_to_09" | pop_data$AG == "10_to_14" | pop_data$AG == "15_to_19" | pop_data$AG == "20_to_24", "Students",
ifelse(pop_data$AG == "25_to_29" | pop_data$AG == "30_to_34" | pop_data$AG == "34_to_39" | pop_data$AG == "40_to_44" | pop_data$AG == "45_to_49" | pop_data$AG == "50_to_54" | pop_data$AG == "55_to_59" | pop_data$AG == "60_to_64", "WorkingAdults", "Retired"))
total_data = aggregate(gen_total_data$Pop, by=list(Category=gen_total_data$Time, gen_total_data$Group), FUN=sum)
colnames(total_data)[1] <- "Time"
colnames(total_data)[2] <- "Group"
colnames(total_data)[3] <- "Pop"
As visualized below, we can see that the largest concentration are in their 40s to 50s and in 10 years time, they would be in their elderly age but there would not be equivalent 20s to 30s year old to replace these people in their 40s to 50s, this would create a gap in the workforce as there aren’t enough people to replace the 40s to 50s when they are not “Economically Active”
age_gender_pyramid <- ggplot(age_gender_df, aes(x = AG, y = Pop, fill = Sex)) +
geom_bar(data = subset(age_gender_df, Sex == "Females"), stat = "identity") +
geom_bar(data = subset(age_gender_df, Sex == "Males"), stat = "identity") +
xlab("Age") + ylab("Population in millions") +
scale_y_continuous(labels = paste0(as.character(c(seq(2, 0, -1), seq(1, 2, 1))), "m")) +
coord_flip()
age_gender_pyramid + theme_classic() + labs(title = "Age Gender Distribution of Singapore 2020",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
As mentioned above, this chart further emphasizes the disparity in population, when we compare the Boomers with GenX and even GenZ, there’s a huge difference in the population size.
generation_gender_pyramid <- ggplot(gen_gender_df, aes(x = reorder(Generation,Pop), y = Pop, fill = Sex)) +
geom_bar(data = subset(gen_gender_df, Sex == "Females"), stat = "identity") +
geom_bar(data = subset(gen_gender_df, Sex == "Males"), stat = "identity") +
xlab("Age") + ylab("Population in millions") +
scale_y_continuous(labels = paste0(as.character(c(seq(2, 0, -1), seq(1, 2, 1))), "m")) +
coord_flip()
generation_gender_pyramid + theme_classic() + labs(title = "Generation Gender Distribution of Singapore 2020 [Sorted by Population of Generation]",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
As seen in the visualization below, these are the top 4 Subzones with the population of people from the Silent Generation * Tampines East * Tampines West * Bedok North * Bedok South
viz <- ggplot(filtered_viz_silent_gen) +
geom_sf(aes(fill=Population), size = .05) + geom_sf_label(aes(label = SUBZONE_N))
viz + theme_void() + labs(title = "Top 4 places the Silent Generations are staying in",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
As seen in the visualization below, these are the top 4 Subzones with the population of people from the Boomers generation * Tampines East * Tampines West * Woodlands East * Bedok North
viz <- ggplot(filtered_viz_boomer) +
geom_sf(aes(fill=Population), size = .05) + geom_sf_label(aes(label = SUBZONE_N))
viz + theme_void() + labs(title = "Top 4 places the Boomers are staying in",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
As seen in the visualization below, these are the top 4 Subzones with the population of people from Generation X * Tampines East * Jurong West Central * Woodlands East * Bedok North
viz <- ggplot(filtered_viz_genx) +
geom_sf(aes(fill=Population), size = .05) + geom_sf_label(aes(label = SUBZONE_N))
viz + theme_void() + labs(title = "Top 4 places the Generation X are staying in",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
This chart omits those Subzones with less than 1000 people living in it It shows the composition of generations living in the Subzones. As seen in the visualization below, Tampines East and Woodlands East are Populated with all 3 aging generation
pop_data_generation <- subset(pop_data_generation, Pop > 1000)
pop_data_generation <- na.omit(pop_data_generation)
ggplot(pop_data_generation, aes(fill=Generation, x=Pop, y=reorder(SZ, Pop))) +
geom_bar(position="stack", stat="identity") + theme_classic() + xlab("Population") + ylab("Planning Area") + labs(title = "Stacked Bar Chart for Subzone-Generation Distribution",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
As the population gets older, the healthcare expenses of the older citizens would increase and the government should plan early especially with a declining birth rate. If the amount of working adults crosses over with the amount of people who are retired, we would not have enough people who are working/economically active.
p <- ggplot(data=subset(total_data, Group == "Retired" | Group == "WorkingAdults"), aes(fill=Group, x=Time, y=Pop, color=Group)) +
geom_line(size = 1) +
geom_point() + scale_y_continuous(label=comma) + scale_x_continuous(breaks = c(2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020)) + theme_classic()
p + labs(title = "Working Adults vs Retired Group 2011-2020",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
Moreover, with a declining birth rate, in the next few decades, as noted in the total population in the Students group, there would not be sufficient people to replace the current “Working Adults” group.
p <- ggplot(data=subset(total_data, Group == "Students" | Group == "WorkingAdults"), aes(x=Time, y=Pop, color=Group)) +
geom_line(size = 1) +
geom_point() + scale_y_continuous(label=comma) + scale_x_continuous(breaks = c(2011, 2015, 2020)) + theme_classic()
p + facet_wrap(~Group) + labs(title = "Students vs Working Adults Group 2011-2020",
caption = "Data Source: Singstat.com") + theme(plot.title = element_text(hjust = 0.5))
The government should focus on getting the aged care services businesses to open in areas with a high concentration of aging adults such as:
With all this being said, we look at things from the brighter side. Automation and AI might eventually help us reduce the amount of people required in the workforce, which could potentially solve the issue of an aging workforce as noted in this article