Relevant Installations and Imports

Packages

# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1) 

# installing packages
packages = c('corrplot', 'ggpubr', 'tidyverse', 'dplyr')

for(p in packages){
  if(!require(p, character.only = T)){install.packages(p)}}

options(warn = defaultW)

Libraries

# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1) 

# importing libraries
library(data.table)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(viridis)
library(stringr)
library(leaflet)
library(tmap)
library(sf)
options(warn = defaultW)

Data

# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1) 

# importing data

# sg mrt data
mrt <- read.csv(file = 'Sg MRT.csv')

# sg population
mpsz <- st_read(dsn = "geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `F:\Arya\MITB\Term2\VA\Assignment 5\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
# lat long mapper
popagsex <- read_csv("respopagesextod2011to2019.csv")

# mrt usage
mrt_usage = read_csv('mrt usage data.csv')

options(warn = defaultW)

What did we want to achieve?

We want to find and establish a correlation between the population in each planning area, location of MRT stations and the number of people boarding the MRT to go to work in each planning area. Through the vizualization, we intend to find out -

  • Whether the construction of MRT stations in a planning area related to the number of people residing there.

  • Whether the number of people boarding the MRT stations in a planning area justify the number of MRT stations constructed.

The following designs will help us achieve the objective:

  • Population Heatmap

The population heatmap will inform the user about the population density in different planning areas and subzones of Singapore.

Population Heatmap

Population Heatmap

  • MRT distribution map

The map will provide the user with information about the number of MRT stations in each planning area.

MRT Distribution

MRT Distribution

  • People boarding MRT stations bar chart

The bar chart will provide the user with information about the number of people boarding the different MRT stations.

MRT Usage

MRT Usage

What was the source of data?

The following is the detail of data files and their sources -

  • The number of people boarding the MRT stations and the population distribution by planning area:

https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data

  • The number of MRT stations:

https://www.kaggle.com/yxlee245/singapore-train-station-coordinates?select=mrt_lrt_data.csv

  • The Singpore latitude longitude mapper data:

https://elearn.smu.edu.sg/d2l/le/content/263168/Home

What were the challenges and how did we resolve them?

The proposed design faces some challenges from the available data. We have listed the challenges and their solutions below:

Challenge 1:

The population data available to us is for all the years starting 2011 upto 2019. But we want to use the latest data for the purpose of this assignment. Below is the code to show the same:

# obtaining unique values of years
unique(popagsex$Time)
## [1] 2011 2012 2013 2014 2015 2016 2017 2018 2019

Solution 1:

We would filter the data to our needs using the code give below:

# filtering for 2019
popagsex_2019 <- popagsex %>%
  filter(Time == 2019)

Challenge 2:

The population data and the lat long map data contain planning area names in different cases (u/l). Below is the code to show the same:

# population data planning area case
head(unique(popagsex_2019$PA))
## [1] "Ang Mo Kio"  "Bedok"       "Bishan"      "Boon Lay"    "Bukit Batok"
## [6] "Bukit Merah"
# population data planning area case
head(unique(mpsz$PLN_AREA_N))
## [1] MARINA SOUTH    OUTRAM          SINGAPORE RIVER BUKIT MERAH    
## [5] QUEENSTOWN      MARINA EAST    
## 55 Levels: ANG MO KIO BEDOK BISHAN BOON LAY BUKIT BATOK ... YISHUN

Solution 2:

We will convert tge population data’s planning area column to upper case. Below is the code to show the same:

popagsex_2019 <- popagsex_2019 %>% mutate_at(.vars = vars(PA, SZ), toupper)

How were the plots created?

1. Population spatial distribution

  • The population data was aggregated at the subzone and planning area level.

  • The data was filtered for economy active greater than zero.

  • The population data and the lat-long mapper data were joined based on the planning area to obtain a master dataset to prepare heatmap.

  • The population density was calculated on the master dataset by dividing the “TOTAL” by “SHAPE_Area”.

  • Heat map was plotted using the “tm_polygon” function. Title was added using the “tm_layout” feature with some other cosmetic changes.

Below is the combined code for the same:

# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1) 


# aggregating the population data at the subzone and planning area level
popagsex_2019_grouped <- popagsex_2019 %>%
spread(AG, Pop) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[9:13])+
rowSums(.[15:17]))%>%
mutate(`AGED`=rowSums(.[18:22])) %>%
mutate(`TOTAL`=rowSums(.[5:22])) %>%
select('PA', 'SZ', 
       'TOTAL',) %>%
filter('ECONOMY ACTIVE' > 0)

# converting keys to be joined as character in both the datasets
popagsex_2019_grouped$SZ = as.character(popagsex_2019_grouped$SZ)
mpsz$SUBZONE_N = as.character(mpsz$SUBZONE_N)

# joining both the datasets to prepare final heatmap dataset
heatmap_data <- left_join(mpsz, popagsex_2019_grouped, 
                              by = c("SUBZONE_N" = "SZ"))

# calculating the population density on the heatmap data to be plotted
heatmap_data <- heatmap_data %>% mutate(Population_Density = TOTAL / SHAPE_Area * 1e6)

# filtering the relevant columns to be used for plotting
heatmap_data_filtered <- heatmap_data[-c(1:2)]

# plotting
tmap_mode("view")
tm_popden <- 
  tm_shape(heatmap_data_filtered) +
  tm_polygons("Population_Density", 
          style = "quantile", 
          palette = "Pastel1") +
  tm_layout(title = 'Singapore Population Density by Subzones and Planning Areas') +
  tm_credits("Data source: www.singstat.gov.sg")

tm_popden
options(warn = defaultW)

2. MRT stations spatial distribution

  • There were no major data preparation steps involved for the MRT stations plot as the data was readily available with the latitude and longitude information and the name of the MRT stations.

  • Used the leaflet package to build the plot. The plot is fully interactive and supports two kind of views - lite and geo.

  • Top view shows the clusters of MRT stations in an area. One can click on a cluster and zoom in to that area to drill down into the distribution of MRT stations further.

# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1) 

# plotting MRT stations using leaflet package
mrt %>%
leaflet() %>%
addProviderTiles(providers$Esri.WorldImagery, group = "World Imagery") %>%
addProviderTiles(providers$Stamen.TonerLite, group = "Toner Lite") %>%
addLayersControl(baseGroups = c("Map Lite", "Contour")) %>%
addMarkers(label = mrt$station_name,
           clusterOptions = markerClusterOptions(),
           popup = mrt$station_name) %>%
addControl('MRT Distribution by Planning Area', position = "topleft", className="map-title")
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
options(warn = defaultW)

3. Population distribution boarding MRT stations

  • The mrt usage data was readily available to plot and so not much data preparation was required for this one.

  • Sort the data descending and filter top 5, because we only want to focus on top 5 planning areas withe the highest number of people boarding the MRT daily.

  • The plot was made using the “bar_plot” function and was ordered decending.

# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1) 

# sorting decending and filtering top 5 only
mrt_usage_sorted <- mrt_usage[with(mrt_usage, order(-mrt_usage)), ]
mrt_usage_top_5 = mrt_usage_sorted %>%  filter(row_number() <= 5)

# plotting number of people boarding major MRT stations
p<-ggplot(data=mrt_usage_top_5, aes(x=planning_area, y=mrt_usage)) +
   geom_bar(stat = "identity", fill="steelblue", width=0.5)+
  labs(title = 'Number of people boarding MRT daily',
       x = 'Planning Area',
       y = 'No. of people boarding MRT (in thousands)')+ coord_flip()

p

options(warn = defaultW)

What were the insights?

  • From the bar chart we can observe that the Sengkang planning tops the list of the number of people that board the MRT stations in a planning area daily. followed by Woodlands and Yishun.

  • From the MRT distribution map we can zoom into the three planning areas with the highest number of mrt usage (as mentioned above) and observe that Sengkang has the second largest number of MRT stations i.e., 23. But the following stations Woodlands and Yishun have only 3 and 4 MRT stations respectively.

  • Also, from the population density distribution map, it can be observed that the population density is quite low in Woodlands and Yishun circa under 940/M and quite high in Sengkang, circa over 5000/M

What was the conclusion?

  • Combining insights #1 and #3, it can be concluded that the intial construction of MRT stations must have been done in accordance with the population density of planning areas. This justifies why Sengkang should have 23 MRT stations as compared 3 and 4 of Woodlands and Yishun respectively.

  • Combining insights #1 and #2, it can be concluded as the migrant population started to increase in Singapore, more and more people started to occupy dwellings/condos built in the outskirts of the city and hence are boarding the MRT in huge numbers from these areas.

  • Therefore, finally concluding from the above two conclusions, the number of MRT stations are not distributed well especially in the outskirts of Singapore where the migrant population has acquired a alot of dwellings and use the MRT in huge numbers daily.