Introduction

The objective of this project is to investigate the demographic structure of Singapore by (1)Age Cohort and (2)Planning Area based on data compiled in 2019.

The data is downloaded from this source: https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data.

This dataset consists of information about Singapore residents by Planning Area, Age Group, Sex and Type of Dwelling from June 2011-2019. This project will explore the population structure based on 3 major age groups the Young (0-14 years old), Economically Active Population (15-64 years old) and the Elderly (>65 years old)

Sketches of the plan

Proposed R Visualizations

Plan part1

Plan 1

Plan part 2

Plan 2

To reveal the demographic structure of Singapore, the following visualizations will be completed:

1. General Population Pyramid

Plotting the general population pyramid will give readers an overview of the demographic structure of Singapore as of 2019.

2. Population pyramids by Type of Dwellings

This will give readers further insights into whether there are any differences in terms of the demographic structure based on the Type of Dwellings they live in.

3. Population pyramid by planning area displayed in a map format

This is to give an overview of the population structure of Singapore broken down by Planning Area and displayed in accordance to the geographical locations of the Planning Area. This could shed light on whether there are locations with higher population density of a particular age group which can help us better manage healthcare facilities, infrastructure and amenities in those particular planning areas.

4.Ternary Chart

The ternary chart is a summary of three main age groups - the Young, Economically Active Population and the Elderly. This part is inspired from in-class codes and online reference: https://rpubs.com/tskam/ternary_plot

Challenges and solutions

Describe the major data and design challenges faced in accomplishing the task, and how you plan to overcome these challenges with a proposed sketched design.

Data Challenges The population dataset is given in terms of multiple levels of detail - Population classified by Type of Dwellings, Planning Area and Subzones. How do we aggregate the given dataset to an appropriate level so that it can be visualized?

The dplyr package can help us group and filter data so that it can be used for visualization.

How do we reformat our data to fit into a population pyramid? - scale_y_continuous and coord_flip() can help us display the correct ticker and flip the geom_bar() visualization to create population pyramids. Alternatively, there is a pyramid package which can be used, but it has less customizability in terms of aesthetics so ggplot() will be used instead.

Design challenges How do we gradually add depth to the understanding of our readers through the visualizations?

-For type of dwellings, we can use facet_wrap function to show the population pyramids by each category of dwelling. This will remove the need to filter out data individually and reveal the demographic structures by dwelling at a single glance.

-More importantly, to reveal the fine-grained population demographic structure by planning area is challenging. From research, a new package known as “geofacet” is suitable to help us visualize the population pyramid structure classified by planning area by their location on the map. However, one shortcoming of this package is that not all planning area is accounted for so the visualizations of the population pyramid may not be complete. Nevertheless, it is sufficient to give enough details so that areas which has more elderly and require more attention can be identified at a glance.

The use of geofacet is inspired by the following visualization: https://xang1234.github.io/pyramidgeofacet/

Step-by-Step guide to creating visualizations

Visualisation 1

1. Load Packages

# library(ggplot2) 
# library(gtools) 
# library(ggthemes) 
# library(geofacet) 
# library(ggtern) 
# library(magrittr) 
# library(plotly) 
# library(readr) 
# library(tidyverse) 
# library(dplyr) 

2. Prepare data

Read and Filter data (2019)

library(readr)
library(magrittr) 
library(dplyr)
data2019<- read_csv("respopagesextod2011to2020.csv")  
data2019 <- data2019 %>% filter(Time==2019) 

head(data2019) 
## # A tibble: 6 x 7
##   PA        SZ               AG    Sex   TOD                           Pop  Time
##   <chr>     <chr>            <chr> <chr> <chr>                       <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats         0  2019
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats               10  2019
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats               10  2019
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive F~    20  2019
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding thos~     0  2019
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties               0  2019

3. Data Wrangling

This section aims to group the existing data into age group and gender only to prepare the data for data visualization in a bar chart format.

Group data by Age and Sex and multiply -1 to one of the gender’s population data to create population pyramid

library(gtools)
groupbysex<- data2019 %>% group_by(AG,Sex) %>% summarise(total=sum(Pop))
groupbysex<-data.frame(groupbysex, stringsAsFactors = TRUE)

for (i in 1:length(groupbysex$total)){
  if(groupbysex$Sex[i]=="Males"){
    groupbysex$total[i] <- -1*groupbysex$total[i]
  }
}

groupbysex$AG<-factor(groupbysex$AG, levels=mixedsort(unique(groupbysex$AG)))

4. Plot Population Pyramid

Use ggplot and geom_bar with subset() to create the population pyramid. Also, set the appropriate themes, legend and reference lines for clearer interpretation of the population pyramid

#graphics.off()
#fig.height=30, fig.width=40
#theme_economist
library(ggplot2)
library(ggthemes)
library(ggeasy)
#scale_fill_discrete(name = "Gender", labels = c("Male", "Female"))
pyramidone <- ggplot(data=groupbysex, aes(x = AG, y = total, fill = Sex)) +
scale_fill_discrete(name = "Gender", labels = c("Male", "Female"))+
  geom_bar(data = subset(groupbysex, Sex == "Males"), stat = "identity") +
  geom_bar(data = subset(groupbysex, Sex == "Females"), stat = "identity") + xlab("Age Group \n")+ scale_y_continuous("\n Population size in thousands",breaks=seq(-180000,180000,20000), limits=c(min(groupbysex$total),max(groupbysex$total)), labels=paste0(as.character(c(seq(180,0,-20),seq(20, 180,20)),"k")))+coord_flip()+
  theme(plot.title = element_text(hjust = -0.2))+
  ggtitle("Singapore 2019 Population by Age Group")+
    geom_vline(xintercept=4, col="blue", lwd=3, lty="dotted")+
    geom_vline(xintercept=14,col="blue", lwd=3, lty="dotted")+
    theme_economist(base_size=20)+
    annotate("text",x=2,y=120000,label="Young (0-14)", size=5)+
    annotate("text",x=6,y=100000,label="Economically Active Population(15-64)",size=5)+
    annotate("text",x=16,y=110000,label="Aged Population (>65)", size=5)

pyramidone

Visualisation 2

1. Data Wrangling

Group by Type of Dwellings, Sex and Age

#Group data
groupbytod<- data2019 %>% group_by(TOD,Sex,AG) %>% summarise(total=sum(Pop))

#Iterate through each row and multiply by -1 for male data
for (i in 1:length(groupbytod$total)){
  if(groupbytod$Sex[i]=="Males"){
    groupbytod$total[i] <- -1*groupbytod$total[i]
  }
}

#Turn Age groups into factors and sort them by unique age groups so that it can be arranged in the correct order
groupbytod$AG<-factor(groupbytod$AG, levels=mixedsort(unique(groupbytod$AG)))

2.Plot second population pyramid

With a new facet_wrap() component added with ~TOD (Type of Dwellings) as categories - the rest of the visualization remains the same as before

#Create visualization for population pyramid
pyramidtwo <- ggplot(data=groupbytod, aes(x = AG, y = total, fill = Sex)) +
  geom_bar(data = subset(groupbytod, Sex == "Males"), stat = "identity") +
  geom_bar(data = subset(groupbytod, Sex == "Females"), stat = "identity") +
  facet_wrap(~TOD, ncol=3, scales="free")+
  scale_y_continuous(breaks=seq(-180000,180000,20000),
                     limits=c(min(groupbytod$total),max(groupbytod$total)),
                              labels=paste0(as.character(c(seq(180,0,-20),seq(20, 180,20)),"k")))+
                     coord_flip() + ylab("Population size in thousands") + xlab("Age group")+
  ggtitle("2019 Population Grouped By Type of Dwelling")+
    theme(
    legend.text = element_text(size = 8),
    legend.title = element_text(size=10),
    axis.title.x=element_text(size=10, family="ITC Officina Sans", vjust=-0.5,face="bold"),
    axis.title.y=element_text(size=10,family="ITC Officina Sans",vjust=2, face="bold"),
    plot.title=element_text(hjust=0.5),
    plot.margin=margin(1,1,1,1,"cm"))


pyramidtwo
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Visualization 3

1. Data Wrangling

Group data by Sex, Planning Area (PA) and Age Group (AG)

groupbypa<- data2019 %>% group_by(Sex,PA,AG) %>% summarise(total=sum(Pop))
groupbypa<-data.frame(groupbypa, stringsAsFactors = TRUE)
#iterate through males data
for (i in 1:length(groupbypa$total)){
  if(groupbypa$Sex[i]=="Males"){
    groupbypa$total[i] <- -1*groupbypa$total[i]
  }
}


groupbypa$AG<-factor(groupbypa$AG, levels=mixedsort(unique(groupbypa$AG)))

2. Plot population pyramid by planning area with an additional component from before (facet_geo)

library(geofacet)
pyramidthree <- ggplot(data=groupbypa, aes(x = AG, y = total, fill = Sex)) +
  facet_geo(vars(PA), grid="sg_planning_area_grid1")+
  geom_bar(stat="identity", width=1, color="black")+
  scale_y_continuous(name="population",breaks=seq(-180000,180000,20000),
                     limits=c(min(groupbypa$total),max(groupbypa$total)),
                              labels=paste0(as.character(c(seq(180,0,-20),seq(20, 180,20)),"k")))+
                     coord_flip() + xlab("Population size in thousands") + ylab("Age group")+ ggtitle("2019 Population Pyramid by Planning Area")+theme(strip.text = element_text(size=60))+
    theme(
    legend.text = element_text(size = 8),
    legend.title = element_text(size=10),
    axis.title.x=element_text(size=10, family="ITC Officina Sans", vjust=-0.5,face="bold"),
    axis.title.y=element_text(size=10,family="ITC Officina Sans",vjust=2, face="bold"),
    plot.title=element_text(hjust=0.5, size=200),
    plot.margin=margin(1,1,1,1,"cm"))
## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.
pyramidthree
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

Visualization 4

1. Data Wrangling

library(tidyr)
data2019<- read_csv("respopagesextod2011to2020.csv")
data2019 <- data2019 %>% filter(Time==2019)
head(data2019)
## # A tibble: 6 x 7
##   PA        SZ               AG    Sex   TOD                           Pop  Time
##   <chr>     <chr>            <chr> <chr> <chr>                       <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats         0  2019
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats               10  2019
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats               10  2019
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive F~    20  2019
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding thos~     0  2019
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties               0  2019
data2019$AG<- factor(data2019$AG, levels=mixedsort(unique(data2019$AG)))
#Deriving the young, economy active and old measures
agpop_mutated <- data2019 %>%
  spread(AG, Pop) %>%
  mutate(YOUNG = rowSums(.[6:8]))%>%
  mutate(ACTIVE = rowSums(.[9:18]))  %>%
  mutate(OLD = rowSums(.[19:24])) %>%
  mutate(TOTAL = rowSums(.[6:24])) %>%
  filter(Time == 2019)%>%
  filter(TOTAL > 0)
df <- data.frame(agpop_mutated)

2. Plot Ternary Chart for an overall summary

library(plotly)
axis <- function(txt) {
  list(
    title = txt, tickformat = "%", tickfont = list(size = 10)
  )
}

ternaryAxes <- list(
  aaxis = axis("Active"),
  baxis = axis("Young"),
  caxis = axis("Old")
)

# Initiating a plotly visualization
plot_ly(
  agpop_mutated,
  a = ~ACTIVE,
  b = ~YOUNG,
  c = ~OLD,
  #text = ~YOUNG,
  type = "scatterternary",
  color = I("blue")
)%>%
  layout(
    ternary = ternaryAxes
  )
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

preserveef58263c72d0221b

Results revealed from visualization

Key finding 1

Based on visualization 1 and 4, one can see that Singapore is a rapidly ageing population because of the broad middle segment as seen in the population pyramid. While most Singaporeans still fall into the “Economically Active Population” category, it can easily be inferred from the visualization that in a few years’ time the broad middle segment of population will advance in years into the elderly population category. The summary provided by the ternary plot allows us to understand that currently majority of our population are the EAP, who will soon be advancing in years to be part of the ageing population.

Key finding 2

Based on visualization 2, one can infer that the demographic structures across different types of dwellings remain quite similar. While most other types of dwellings have a broad middle segment, the 3 room flat HDB has a structure that broadens towards the top. This could mean that many of the people living in 3 room HDB flats are elderly. Other types of dwellings have similar distributions - narrow bottom/broad middle and narrow bottom.

Key finding 3

Based on visualization 3, one can infer that Sengkang and Punggol have quite distinctive population structures with a broad middle segment. This means this area has proportionately higher number Economically Active Population living in the area. Bedok is a planning area to look out for as the pyramid structure is V-shape which means there is proportionately a higher number of elderly living in the area as compared to other areas.