1 Overview

As a global city, and essential trade port, Singapore has long been a melting pot of culture, race and religion. With so many different races all under one roof, it has not always been easy to manage. Over the years since the founding of modern Singapore it has gone through difficult periods of social unrest to reach and maintain the current tenuous equilibrium between the various races, enjoyed by all.

1.1 Problem Statement

As such this project aims to visualise and understand the ethnic distribution of Singapore’s population

1.2 Purpose of Visualisation

It is important to Singapore’s government agencies and everyday citizens, to understand the ethnic make up of Singapore. As this will allow them to better plan resources to bolster social cohesion and ensure fair and equitable distribution of resources and to prevent ethnic unrest.

Singapore classifies and records 4 groupings for its ethnic population:

Chinese
Malay
Indian
Others (Includes all other ethnic groups that do not match the first 3 e.g Eurasians, Filipinos, Japanese)

To understand the distribution, we will be utilising the 2015 General Household Survey, This is a survey that is conducted every 5 years by the Singapore Department of Statistics.

1.3 The Data

We will be utilizing data from the following data sources:

URA Master Plan subzone boundary in shapefile format (i.e. MP14_SUBZONE_WEB_PL). This is a geospatial data. It consists of the geographical boundary of Singapore at the planning subzone level. The data is based on URA Master Plan 2014.
General Household Survey 2015, Basic Demographic Characteristics (Table 1 & 8). The original data was downloaded in excel format and the relevant tables where extracted into CSVs and recoded in to a more accessible format.

2 Expected Data and Design Challenges and Mitigation

Here are some expected challenges to be faced and how it will be mitigated.

Challenges	Mitigations
Data utilised are not in the required formats or shapes	Utilise tidyverse to manipulate the data and prepare it to be utilised in visualisation
Population distribution data is not geospatially mapped	Join attribute data (Ethnic Groups Population Distribution) with a geospatial data (Subzone boundary)
Unsure on what type of visualisation to use and how to create them in ggplot	Research possible visualisation and experimenting with them as part of the project
Depth of information all will need to be showcased in one visualisation.	Utilise multiple vizualisation and combine them into one unified visualisation

3 Visualisation Sketch

Below you can visualise the expected final visualisation that will be created. This is a rough sketch that will serve as guide as we prepare the project in the next few sections.

4 Data Viz Step by Step

4.1 Install and load R packages

For the purpose of this project we will be utilising the following packages:

tidyverse contains a set of essential packages for data manipulation and exploration.
sf to encode spatial vector data.
tmap to create dymanic thematic maps
plotly to create interactive web graphics from ‘ggplot2’ graphs
tmaptools tools for reading and processing spatial data
gridExtra Provides a number of user-level functions to work with “grid” graphics, notably to arrange multiple grid-based plots on a page.
sunburstR Make interactive ‘d3.js’ sequence sunburst diagrams in R
htmltools Tools for HTML generation and output
d3r Provides a suite of functions to help ease the use of ‘d3.js’ in R.

The following code chunk checks if the package required is installed in your environment and installs it if it is not and also loads the package to be utilised.

packages <- c('tidyverse', 'sf', 'tmap','plotly','tmaptools','sunburstR','htmltools','d3r','gridExtra')

for (p in packages) {
  if (!require(p, character.only = T)) {
    install.packages(p)
  }
  library(p, character.only = T)
}

4.2 Importing Ethnicity by Citizenship type

We will start by importing in ethnicity data that provides us view of the number of people for each ethnicity that lived in singapore for each age group and their corresponding citizenship type split (Permanent Resident, Singapore Citizen etc.) in the year 2015.

eth_citizenship <- read_csv('data/aspatial/Ethnic_Mix_Split3.csv')
eth_citizenship1 <- read_csv('data/aspatial/Ethnic_Mix_Split2.csv')

head(eth_citizenship)

4.3 Visualising Ethnicity Distibution By Age Group

As the data eth_citizenship has a sudo hierarchical structure, we can create a sunburst diagram.

4.3.1 Making table longer

To better utilise the data in our visualisation, lets first convert the table from long to wide.

eth_citizenship<-eth_citizenship %>%
  pivot_longer(!`Age Group (Years)`&!`Citizenship`, names_to = "ethnicity", values_to = "population")

4.3.2 Visualisation

The Sunburst will radiate out in levels, with the first level being citizenship type, then ethnicity, then age group.

To do this visualisation, we will first, rearrange the columns from the first level at the left and the last level at the right of the table, and then we will create a tree map based on the data to visualise out. The following code chunks does this.

eth_citizenship <- eth_citizenship%>%
  select(Citizenship,ethnicity,`Age Group (Years)`,population)

tree <- d3_nest(eth_citizenship, value_cols = "population")

sb3 <- sunburst(
    tree,
    legend = list(w=250), # make extra room for our legend
    count = TRUE,
    width = "100%",
    height = 600,
    valueField = "population"
)

Sunburst Chart of Singapore Population by Citizenship, Ethnicity and Age Group

(Click or hover on chart to show proportion)

The visualisation created is an aesthetic radial diagram, that users can interactively utilise to view the proportion splits of the ethnic groups in Singapore.

4.4 Population Pyramid By Gender and Ethnic Group

An interesting visualisation to do will be population pyramid, as agencies can utilise a population pyramid to understand how the population is distibuted for each ethnicity. This may assist them in coming up with target policies to encourage birth rates or offer support for the sandwiched classes.

4.4.1 Data Preperation

Before embarking on the visualisation we will need to prepare the data, this will require us to multiply all the male population by -1 so that we can create the the 2 sides of the graph and then to make the table longer so that we can more easily plot the graph.

Additionally we will need to convert Age group into a factor so that the plot will be in order of the age group.

eth_citizenship1<-eth_citizenship1 %>%
  mutate_at(vars(ends_with("_Males")),~(.x * -1))%>%
  pivot_longer(!`Age Group (Years)`&!`Citizenship`, names_to = "ethnicity", values_to = "population")

eth_citizenship1$`Age Group (Years)`<-factor(eth_citizenship1$`Age Group (Years)`, levels = eth_citizenship1$`Age Group (Years)`, labels=eth_citizenship1$`Age Group (Years)`)

eth_citizenship1

4.4.2 Visualisation

Once we are done with the plotting we can now embark on visualising the population pyramid. We will utilise ggplotly to make the plot intereactive so that users can play around and explore the visualisations. The following code chunk does this.

pyramid_c <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Chinese_Females"), stat = "identity") + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Chinese_Males"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000), 
                     labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) + 
  coord_flip() + 
  theme(legend.title = element_blank())+
  scale_fill_manual(values = c("#a6cee3","#1f78b4"))

pyramid_m <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Malay_Females"), stat = "identity") + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Malay_Males"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000), 
                     labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) + 
  coord_flip() +
  scale_fill_manual(values = c("#b2df8a","#33a02c"))

pyramid_i <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Indian_Females"), stat = "identity") + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Indian_Males"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000), 
                     labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) + 
  coord_flip() + 
  theme(legend.title = element_blank())+
  scale_fill_manual(values = c("#fb9a99","#e31a1c"))

pyramid_o <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Others_Females"), stat = "identity") + 
  geom_bar(data = subset(eth_citizenship1, ethnicity == "Others_Males"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000), 
                     labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) + 
  coord_flip() + 
  theme(legend.title = element_blank())+
  scale_fill_manual(values = c("#fdbf6f","#ff7f00"))

p_c <-ggplotly(pyramid_c)
p_m <-ggplotly(pyramid_m)
p_i <-ggplotly(pyramid_i)
p_o <-ggplotly(pyramid_o)

sp<-subplot(p_c,p_m,p_i,p_o,nrows=2, shareY=TRUE)
sp

4.5 Ethnicity by Subzone

Now lets bring in the resident ethnicity split for the population in singapore by the subzone that they reside in.

eth_subzone <- read_csv('data/aspatial/Ethnic_Mix_Subzone1.csv')

head(eth_subzone)

4.5.1 Creating Total and making table longer

To better utilise this data in our visualisation, let create a total of all the ethnic total and convert the table from wide to long

eth_subzone<-eth_subzone %>%
  mutate(total = Chinese_Total+Malay_Total+Indian_Total+Others_Total)

eth_subzone_long<-eth_subzone %>%
  pivot_longer(!`Planning Area`&!`Subzone`&!total, names_to = "ethnicity", values_to = "population")

eth_subzone

By doing this, we can more easily utilise the data when we combine it later with our geospatial data.

4.6 Singapore MP14 Subzone Boundary

The code chunk below, imports in our geospatial data as a simple feature data frame. We will utilise this data in our geospatial visualisation as it is a geospatial data.

mpsz <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")

## Reading layer `MP14_SUBZONE_WEB_PL' from data source `E:\Y4S1\va\in_class-Ex\Assignment_5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

4.6.1 Set CRS to 3414

Before utilising this data, lets set the crs to be 3414 (SVY21). So that the projections that we will be working with are consistent.

mpsz<-st_transform(mpsz,3414)

To check that it is set properly we can view the CRS here.

st_crs(mpsz)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

4.6.2 Combining attribute data into subzone

We will be combining mpsz with eth_subzone so that we can geographically visualise the information better. To do this we will have to first change the subzone name in mpsz as it is upper case while the subzone name in eth_subzone is in title case.

mpsz$SUBZONE_N <- str_to_title(mpsz$SUBZONE_N)

After this we can join the geographic data with the attribute data with the following code chunk.

mpsz_eth <- left_join(mpsz, eth_subzone_long, 
                      by = c("SUBZONE_N" = "Subzone"))

head(mpsz_eth)

Now that every thing is combine we can even do a quick visualisation of the subzones based on the total population

qtm(mpsz_eth, fill = "total")

We can now start to work on our visualisation.

4.7 Population Distribution Choropleth Maps by Ethnicity in Singapore

One way that we can visualise the geographic distribution of each ethnicity is by utilising a choropleth map. We will be creating 4 maps that will be facetted based on the ethnicity. To allow for easy comparison accross the board, we will also be setting the facets to be synced so when you zoom in to one map, you will also zoom in to the other maps.

The following code chunk does this.

eth_map2 <- tm_shape(mpsz_eth) +
  tm_fill("population",
          style = "quantile",
          palette = "Blues",
          thres.poly = 0) + 
  tm_facets(by="ethnicity", 
            free.coords=TRUE,
            drop.NA.facets=TRUE,
            drop.empty.facets=TRUE,
            sync=TRUE) +
  tm_layout(legend.show = FALSE,
            title.position = c("center", "center"), 
            title.size = 20) +
  tm_borders(alpha = 0.5)

tmap_mode(mode="plot")
eth_map2

To set the interactive feature of the map, we can utilise tmap_mode(mode="plot"). This will be utilised in the final visualisation. We can currently view the static visualisation of the ethnic distribution by subzone. The areas that are darker blue, indicate where more number of the ethnic groups population are situated at.

5 Final Visualisation

Now that all the visualisation are complete, we can visualise all of them together and conduct analysis on them.

Exploring Singapore's Ethnic Composition in 2015

Sunburst Chart of Singapore Population by Citizenship Type, Ethnicity and Age Group