As a global city, and essential trade port, Singapore has long been a melting pot of culture, race and religion. With so many different races all under one roof, it has not always been easy to manage. Over the years since the founding of modern Singapore it has gone through difficult periods of social unrest to reach and maintain the current tenuous equilibrium between the various races, enjoyed by all.
As such this project aims to visualise and understand the ethnic distribution of Singapore’s population
It is important to Singapore’s government agencies and everyday citizens, to understand the ethnic make up of Singapore. As this will allow them to better plan resources to bolster social cohesion and ensure fair and equitable distribution of resources and to prevent ethnic unrest.
Singapore classifies and records 4 groupings for its ethnic population:
To understand the distribution, we will be utilising the 2015 General Household Survey, This is a survey that is conducted every 5 years by the Singapore Department of Statistics.
We will be utilizing data from the following data sources:
Here are some expected challenges to be faced and how it will be mitigated.
| Challenges | Mitigations |
|---|---|
| Data utilised are not in the required formats or shapes | Utilise tidyverse to manipulate the data and prepare it to be utilised in visualisation |
| Population distribution data is not geospatially mapped |
Join attribute data (Ethnic Groups Population Distribution) with a geospatial data (Subzone boundary) |
| Unsure on what type of visualisation to use and how to create them in ggplot |
Research possible visualisation and experimenting with them as part of the project |
| Depth of information all will need to be showcased in one visualisation. |
Utilise multiple vizualisation and combine them into one unified visualisation |
Below you can visualise the expected final visualisation that will be created. This is a rough sketch that will serve as guide as we prepare the project in the next few sections.
For the purpose of this project we will be utilising the following packages:
tidyverse contains a set of essential packages for data manipulation and exploration.sf to encode spatial vector data.tmap to create dymanic thematic mapsplotly to create interactive web graphics from ‘ggplot2’ graphstmaptools tools for reading and processing spatial datagridExtra Provides a number of user-level functions to work with “grid” graphics, notably to arrange multiple grid-based plots on a page.sunburstR Make interactive ‘d3.js’ sequence sunburst diagrams in Rhtmltools Tools for HTML generation and outputd3r Provides a suite of functions to help ease the use of ‘d3.js’ in R.The following code chunk checks if the package required is installed in your environment and installs it if it is not and also loads the package to be utilised.
packages <- c('tidyverse', 'sf', 'tmap','plotly','tmaptools','sunburstR','htmltools','d3r','gridExtra')
for (p in packages) {
if (!require(p, character.only = T)) {
install.packages(p)
}
library(p, character.only = T)
}
We will start by importing in ethnicity data that provides us view of the number of people for each ethnicity that lived in singapore for each age group and their corresponding citizenship type split (Permanent Resident, Singapore Citizen etc.) in the year 2015.
eth_citizenship <- read_csv('data/aspatial/Ethnic_Mix_Split3.csv')
eth_citizenship1 <- read_csv('data/aspatial/Ethnic_Mix_Split2.csv')
head(eth_citizenship)
As the data eth_citizenship has a sudo hierarchical structure, we can create a sunburst diagram.
To better utilise the data in our visualisation, lets first convert the table from long to wide.
eth_citizenship<-eth_citizenship %>%
pivot_longer(!`Age Group (Years)`&!`Citizenship`, names_to = "ethnicity", values_to = "population")
The Sunburst will radiate out in levels, with the first level being citizenship type, then ethnicity, then age group.
To do this visualisation, we will first, rearrange the columns from the first level at the left and the last level at the right of the table, and then we will create a tree map based on the data to visualise out. The following code chunks does this.
eth_citizenship <- eth_citizenship%>%
select(Citizenship,ethnicity,`Age Group (Years)`,population)
tree <- d3_nest(eth_citizenship, value_cols = "population")
sb3 <- sunburst(
tree,
legend = list(w=250), # make extra room for our legend
count = TRUE,
width = "100%",
height = 600,
valueField = "population"
)
An interesting visualisation to do will be population pyramid, as agencies can utilise a population pyramid to understand how the population is distibuted for each ethnicity. This may assist them in coming up with target policies to encourage birth rates or offer support for the sandwiched classes.
Before embarking on the visualisation we will need to prepare the data, this will require us to multiply all the male population by -1 so that we can create the the 2 sides of the graph and then to make the table longer so that we can more easily plot the graph.
Additionally we will need to convert Age group into a factor so that the plot will be in order of the age group.
eth_citizenship1<-eth_citizenship1 %>%
mutate_at(vars(ends_with("_Males")),~(.x * -1))%>%
pivot_longer(!`Age Group (Years)`&!`Citizenship`, names_to = "ethnicity", values_to = "population")
eth_citizenship1$`Age Group (Years)`<-factor(eth_citizenship1$`Age Group (Years)`, levels = eth_citizenship1$`Age Group (Years)`, labels=eth_citizenship1$`Age Group (Years)`)
eth_citizenship1
Once we are done with the plotting we can now embark on visualising the population pyramid. We will utilise ggplotly to make the plot intereactive so that users can play around and explore the visualisations. The following code chunk does this.
pyramid_c <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Chinese_Females"), stat = "identity") +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Chinese_Males"), stat = "identity") +
scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000),
labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) +
coord_flip() +
theme(legend.title = element_blank())+
scale_fill_manual(values = c("#a6cee3","#1f78b4"))
pyramid_m <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Malay_Females"), stat = "identity") +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Malay_Males"), stat = "identity") +
scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000),
labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) +
coord_flip() +
scale_fill_manual(values = c("#b2df8a","#33a02c"))
pyramid_i <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Indian_Females"), stat = "identity") +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Indian_Males"), stat = "identity") +
scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000),
labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) +
coord_flip() +
theme(legend.title = element_blank())+
scale_fill_manual(values = c("#fb9a99","#e31a1c"))
pyramid_o <- ggplot(eth_citizenship1, aes(x = `Age Group (Years)`, y = population, fill = ethnicity)) +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Others_Females"), stat = "identity") +
geom_bar(data = subset(eth_citizenship1, ethnicity == "Others_Males"), stat = "identity") +
scale_y_continuous(breaks = seq(-15000000, 15000000, 5000000),
labels = paste0(as.character(c(seq(15, 0, -5), seq(5, 15, 5))), "m")) +
coord_flip() +
theme(legend.title = element_blank())+
scale_fill_manual(values = c("#fdbf6f","#ff7f00"))
p_c <-ggplotly(pyramid_c)
p_m <-ggplotly(pyramid_m)
p_i <-ggplotly(pyramid_i)
p_o <-ggplotly(pyramid_o)
sp<-subplot(p_c,p_m,p_i,p_o,nrows=2, shareY=TRUE)
sp
Now lets bring in the resident ethnicity split for the population in singapore by the subzone that they reside in.
eth_subzone <- read_csv('data/aspatial/Ethnic_Mix_Subzone1.csv')
head(eth_subzone)
To better utilise this data in our visualisation, let create a total of all the ethnic total and convert the table from wide to long
eth_subzone<-eth_subzone %>%
mutate(total = Chinese_Total+Malay_Total+Indian_Total+Others_Total)
eth_subzone_long<-eth_subzone %>%
pivot_longer(!`Planning Area`&!`Subzone`&!total, names_to = "ethnicity", values_to = "population")
eth_subzone
By doing this, we can more easily utilise the data when we combine it later with our geospatial data.
The code chunk below, imports in our geospatial data as a simple feature data frame. We will utilise this data in our geospatial visualisation as it is a geospatial data.
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `E:\Y4S1\va\in_class-Ex\Assignment_5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
Before utilising this data, lets set the crs to be 3414 (SVY21). So that the projections that we will be working with are consistent.
mpsz<-st_transform(mpsz,3414)
To check that it is set properly we can view the CRS here.
st_crs(mpsz)
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
We will be combining mpsz with eth_subzone so that we can geographically visualise the information better. To do this we will have to first change the subzone name in mpsz as it is upper case while the subzone name in eth_subzone is in title case.
mpsz$SUBZONE_N <- str_to_title(mpsz$SUBZONE_N)
After this we can join the geographic data with the attribute data with the following code chunk.
mpsz_eth <- left_join(mpsz, eth_subzone_long,
by = c("SUBZONE_N" = "Subzone"))
head(mpsz_eth)
Now that every thing is combine we can even do a quick visualisation of the subzones based on the total population
qtm(mpsz_eth, fill = "total")
We can now start to work on our visualisation.
One way that we can visualise the geographic distribution of each ethnicity is by utilising a choropleth map. We will be creating 4 maps that will be facetted based on the ethnicity. To allow for easy comparison accross the board, we will also be setting the facets to be synced so when you zoom in to one map, you will also zoom in to the other maps.
The following code chunk does this.
eth_map2 <- tm_shape(mpsz_eth) +
tm_fill("population",
style = "quantile",
palette = "Blues",
thres.poly = 0) +
tm_facets(by="ethnicity",
free.coords=TRUE,
drop.NA.facets=TRUE,
drop.empty.facets=TRUE,
sync=TRUE) +
tm_layout(legend.show = FALSE,
title.position = c("center", "center"),
title.size = 20) +
tm_borders(alpha = 0.5)
tmap_mode(mode="plot")
eth_map2
To set the interactive feature of the map, we can utilise tmap_mode(mode="plot"). This will be utilised in the final visualisation. We can currently view the static visualisation of the ethnic distribution by subzone. The areas that are darker blue, indicate where more number of the ethnic groups population are situated at.
Now that all the visualisation are complete, we can visualise all of them together and conduct analysis on them.
The final visualisation provides us insights on the demographic (citizenship, age, gender) and geographic distribution of each ethnic groups. These will help agencies to better plan and target policies and in problem identification. There are quite a few interesting insights that we can glean from these visualizations, such as:
A work by Rajiv Abraham Xavier