This visualization aims to reveal the demographic structure of Singapore’s population specifically by age group and planning area in the year 2019.
“Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, June 2011-2019”, data set by Singapore Department of Statistics.
| Type | Challenge | Solution |
|---|---|---|
| Data | AG 5_to_9 is placed at the wrong position | Rename it to 05_to_9 so that it comes immediately after “0_to_4” |
| Design | The large number of age groups can show a more representative population distribution but also makes it challenging to interpret so many age groups | Use histogram to show the shape of the distribution and a ternary plot to visualize the population in terms of 3 main categories: young dependents, economically active and old dependents |
| Data | Getting the right rows and columns for the plots | Prepare 2 different data sets from the original data set to fulfill the respective requirements for each plot |
| Design | Difficult to interpret and differentiate the points on a ternary plot because of the 3 axes and the identical dots on the plot | Scale the size of the markers by Old Age Support Ratio and use colors to differentiate the Planning Areas |
| Design | Labeling x axis tick marks would make it very cluttered due to the large number of age groups | Hide the tick marks and use a colored legend to distinguish the age groups instead |
packages = c('plotly', 'readr', 'dplyr', 'tidyr')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
pop_data <- read_csv('population-data/respopagesextod2011to2020.csv')
pop_filtered_data <- pop_data %>%
filter(Time==2019)
pop_filtered_data$AG <- as.character(pop_filtered_data$AG)
pop_filtered_data$AG[pop_filtered_data$AG == "5_to_9"] <-"05_to_9"
pop_filtered_data
## # A tibble: 98,192 x 7
## PA SZ AG Sex TOD Pop Time
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 1- and 2-Room Flats 0 2019
## 2 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 3-Room Flats 10 2019
## 3 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 4-Room Flats 10 2019
## 4 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 5-Room and Executive… 20 2019
## 5 Ang Mo K… Ang Mo Kio Town… 0_to… Males HUDC Flats (excluding th… 0 2019
## 6 Ang Mo K… Ang Mo Kio Town… 0_to… Males Landed Properties 0 2019
## 7 Ang Mo K… Ang Mo Kio Town… 0_to… Males Condominiums and Other A… 50 2019
## 8 Ang Mo K… Ang Mo Kio Town… 0_to… Males Others 0 2019
## 9 Ang Mo K… Ang Mo Kio Town… 0_to… Femal… HDB 1- and 2-Room Flats 0 2019
## 10 Ang Mo K… Ang Mo Kio Town… 0_to… Femal… HDB 3-Room Flats 10 2019
## # … with 98,182 more rows
pop_hist_data <- pop_filtered_data %>%
group_by(PA,AG) %>%
summarise(Pop = sum(Pop)) %>%
filter(Pop > 0)
pop_hist_data
## # A tibble: 757 x 3
## # Groups: PA [42]
## PA AG Pop
## <chr> <chr> <dbl>
## 1 Ang Mo Kio 0_to_4 5420
## 2 Ang Mo Kio 05_to_9 6230
## 3 Ang Mo Kio 10_to_14 7380
## 4 Ang Mo Kio 15_to_19 7930
## 5 Ang Mo Kio 20_to_24 8920
## 6 Ang Mo Kio 25_to_29 10620
## 7 Ang Mo Kio 30_to_34 10510
## 8 Ang Mo Kio 35_to_39 10940
## 9 Ang Mo Kio 40_to_44 11760
## 10 Ang Mo Kio 45_to_49 12570
## # … with 747 more rows
hist <- pop_hist_data %>%
ggplot(aes(x=AG,y=Pop))+
geom_bar(
aes(fill = AG), stat = "identity", color = "white",
position = position_dodge(0.9)
)+
facet_wrap(~PA)+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
strip.text.x = element_text(size=6),
axis.text.y=element_text(size=5)
)+
ggtitle("Population Distribution by Age Group by Planning Area 2019")+
xlab("Age Group")+
ylab("Population")
hist
pop_pivoted_data <- pop_filtered_data %>%
group_by(PA,AG) %>%
summarise(Pop = sum(Pop), .groups='drop') %>%
pivot_wider(names_from = AG, values_from = Pop)
pop_pivoted_data
## # A tibble: 55 x 20
## PA `0_to_4` `05_to_9` `10_to_14` `15_to_19` `20_to_24` `25_to_29`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Ang Mo Kio 5420 6230 7380 7930 8920 10620
## 2 Bedok 10020 11640 13300 14640 16660 19530
## 3 Bishan 2850 3850 4430 4740 5570 7090
## 4 Boon Lay 0 0 0 0 0 0
## 5 Bukit Batok 7130 6640 7800 8800 9850 12510
## 6 Bukit Merah 6100 6650 6640 6380 6850 9140
## 7 Bukit Panjang 6700 7230 7680 8500 9570 10560
## 8 Bukit Timah 3160 4820 5000 4810 4690 4710
## 9 Central Water… 0 0 0 0 0 0
## 10 Changi 130 130 140 110 80 100
## # … with 45 more rows, and 13 more variables: 30_to_34 <dbl>, 35_to_39 <dbl>,
## # 40_to_44 <dbl>, 45_to_49 <dbl>, 50_to_54 <dbl>, 55_to_59 <dbl>,
## # 60_to_64 <dbl>, 65_to_69 <dbl>, 70_to_74 <dbl>, 75_to_79 <dbl>,
## # 80_to_84 <dbl>, 85_to_89 <dbl>, 90_and_over <dbl>
pop_mutated <- pop_pivoted_data %>%
mutate(YOUNG = rowSums(.[3:7]))%>%
mutate(ACTIVE = rowSums(.[8:15])) %>%
mutate(OLD = rowSums(.[16:20])) %>%
mutate(TOTAL = rowSums(.[3:20])) %>%
mutate(Old_age_ratio = round(.$ACTIVE/.$OLD,2)) %>%
filter(TOTAL > 0)
df <- data.frame(pop_mutated)
axis <- function(txt) {
list(
title = txt, tickformat = "%", tickfont = list(size = 10)
)
}
ternaryAxes = list(
aaxis = axis("ACTIVE(A)"),
baxis = axis("OLD(B)"),
caxis = axis("YOUNG(C)")
)
ternary <- plot_ly(
df, a = ~ACTIVE, b = ~`OLD`, c = ~YOUNG, color = ~PA, type = "scatterternary",text=~paste("<br>Planning Area:",df$PA,'<br>Old Age Support Ratio:',Old_age_ratio) ,mode="marker",marker=list(size = ~Old_age_ratio*1.2, opacity=1),colors = "Set1"
) %>%
layout(title="Old Age Support Ratio (A/B) by Planning Area 2019 ",
ternary = ternaryAxes,margin = 0.05,showlegend = TRUE
)
ternary
hist
ternary
The top visualization reveals the population count as well as the spread across the age groups for each planning area.
The bottom visualization complements the top by grouping the age groups into Young, Active and Old categories and providing the Old Age Support Ratio across each planning area.