1. Overview

This viz aimed to analyze demographic structure of Singapore population by age cohort and by planning area in 2019. The original dataset is from https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data. Singapore Residents by Planning Area/Subzone, Age Group and Sex.

1.1 Challenges

  • Challenge 1: There are missing values in source data such as population size on individual sub-zones.
  • Challenge 2: Calculation required to obtain total population size per planning area.
  • Challenge 3: It is difficult to find distribution pattern of different age cohorts to over 55 planning areas.

1.2 Sketch of Proposed DataViz Design

Gender Pyramid

Heatmap

2. Suggestions

3. DataViz Step by Step Guide

3.1 Population Pyramid Chart

Step 1:Load Data and import package

packages = c('tidyverse','ggthemes','ggtern','plotly','viridis')

for(p in packages){library
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

rdata <- read.csv("2019.csv")

names(rdata)<-c("PA","Subzone","Age","Gender","Population","Year")

Step 2: Select Age Groups as x-axis and Population as y-axis to construct gender pyramid

rdata$Age <- factor(rdata$Age,levels = unique(rdata$Age))
levels(rdata$Age)
##  [1] "0_to_4"      "5_to_9"      "10_to_14"    "15_to_19"    "20_to_24"   
##  [6] "25_to_29"    "30_to_34"    "35_to_39"    "40_to_44"    "45_to_49"   
## [11] "50_to_54"    "55_to_59"    "60_to_64"    "65_to_69"    "70_to_74"   
## [16] "75_to_79"    "80_to_84"    "85_to_89"    "90_and_over"
rdata$Population <- ifelse(rdata$Gender == "Females", -1*rdata$Population, rdata$Population)

age_groups <- ggplot(data = rdata, aes(x = Age, y = Population, fill = Gender)) +
  geom_bar(data = subset(rdata, Gender == "Females"), stat = "identity") +
  geom_bar(data = subset(rdata, Gender == "Males"), stat = "identity") +
  scale_y_continuous(breaks = seq(-150000, 150000, 50000), 
                     labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))))) +
coord_flip()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
viz<-age_groups + labs(
x="Age Group",
y = "Population Size (in thousands)",
title = "Population Distribution per Age Groups in 2019"
)+ theme(
    legend.position = "right" 
)

ggplotly(viz, width = 900, height = 700)

3.2 Heatmap measuring age cohorts and planning area

Step 1: Load data and calculate total population size

rdata <- read.csv('2019.csv') %>%
  group_by(PA, SZ, AG, Sex, Pop, Year) %>%
  dplyr::summarise(Pop = sum(Pop))

Step 2: Select Age Groups and Planning Areas for heatmap building

heatmap_PA <- rdata %>% 
  group_by(AG, PA) %>%
  dplyr::summarise(Pop = sum(Pop)) %>%
  ggplot(aes(x = factor(AG), y = PA, fill = Pop)) +
  geom_tile()+
  
  ggtitle(label = "Population Distribution per Age Groups and Planning Area in 2019") +
  theme(axis.text.x = element_text(size = 8, angle = 45, hjust = 1),
        axis.text.y = element_text(size = 8),
        plot.title = element_text(hjust = 0.5)) +
  scale_fill_gradient(name = "Total Population Size",
                      low = "#87F2F9",
                      high = "2C73B3") +
  xlab("Age Groups") + 
  ylab("Planning Area")
ggplotly(heatmap_PA, width = 800, height = 800)
## `summarise()` regrouping output by 'AG' (override with `.groups` argument)

4. Final Findings