# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1)
# installing packages
packages = c('corrplot', 'ggpubr', 'tidyverse', 'dplyr')
for(p in packages){
if(!require(p, character.only = T)){install.packages(p)}}
options(warn = defaultW)
# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1)
# importing libraries
library(data.table)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(viridis)
library(stringr)
library(leaflet)
library(tmap)
library(sf)
options(warn = defaultW)
# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1)
# importing data
# sg mrt data
mrt <- read.csv(file = 'Sg MRT.csv')
# sg population
mpsz <- st_read(dsn = "geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `F:\Arya\MITB\Term2\VA\Assignment 5\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
# lat long mapper
popagsex <- read_csv("respopagesextod2011to2019.csv")
# mrt usage
mrt_usage = read_csv('mrt usage data.csv')
options(warn = defaultW)
We want to find and establish a correlation between the population in each planning area, location of MRT stations and the number of people boarding the MRT to go to work in each planning area. Through the vizualization, we intend to find out -
Whether the construction of MRT stations in a planning area related to the number of people residing there.
Whether the number of people boarding the MRT stations in a planning area justify the number of MRT stations constructed.
The following designs will help us achieve the objective:
The population heatmap will inform the user about the population density in different planning areas and subzones of Singapore.
Population Heatmap
The map will provide the user with information about the number of MRT stations in each planning area.
MRT Distribution
The bar chart will provide the user with information about the number of people boarding the different MRT stations.
MRT Usage
The following is the detail of data files and their sources -
https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data
https://www.kaggle.com/yxlee245/singapore-train-station-coordinates?select=mrt_lrt_data.csv
The proposed design faces some challenges from the available data. We have listed the challenges and their solutions below:
The population data available to us is for all the years starting 2011 upto 2019. But we want to use the latest data for the purpose of this assignment. Below is the code to show the same:
# obtaining unique values of years
unique(popagsex$Time)
## [1] 2011 2012 2013 2014 2015 2016 2017 2018 2019
We would filter the data to our needs using the code give below:
# filtering for 2019
popagsex_2019 <- popagsex %>%
filter(Time == 2019)
The population data and the lat long map data contain planning area names in different cases (u/l). Below is the code to show the same:
# population data planning area case
head(unique(popagsex_2019$PA))
## [1] "Ang Mo Kio" "Bedok" "Bishan" "Boon Lay" "Bukit Batok"
## [6] "Bukit Merah"
# population data planning area case
head(unique(mpsz$PLN_AREA_N))
## [1] MARINA SOUTH OUTRAM SINGAPORE RIVER BUKIT MERAH
## [5] QUEENSTOWN MARINA EAST
## 55 Levels: ANG MO KIO BEDOK BISHAN BOON LAY BUKIT BATOK ... YISHUN
We will convert tge population data’s planning area column to upper case. Below is the code to show the same:
popagsex_2019 <- popagsex_2019 %>% mutate_at(.vars = vars(PA, SZ), toupper)
The population data was aggregated at the subzone and planning area level.
The data was filtered for economy active greater than zero.
The population data and the lat-long mapper data were joined based on the planning area to obtain a master dataset to prepare heatmap.
The population density was calculated on the master dataset by dividing the “TOTAL” by “SHAPE_Area”.
Heat map was plotted using the “tm_polygon” function. Title was added using the “tm_layout” feature with some other cosmetic changes.
Below is the combined code for the same:
# suppressing warnings
defaultW <- getOption("warn")
options(warn = -1)
# aggregating the population data at the subzone and planning area level
popagsex_2019_grouped <- popagsex_2019 %>%
spread(AG, Pop) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[9:13])+
rowSums(.[15:17]))%>%
mutate(`AGED`=rowSums(.[18:22])) %>%
mutate(`TOTAL`=rowSums(.[5:22])) %>%
select('PA', 'SZ',
'TOTAL',) %>%
filter('ECONOMY ACTIVE' > 0)
# converting keys to be joined as character in both the datasets
popagsex_2019_grouped$SZ = as.character(popagsex_2019_grouped$SZ)
mpsz$SUBZONE_N = as.character(mpsz$SUBZONE_N)
# joining both the datasets to prepare final heatmap dataset
heatmap_data <- left_join(mpsz, popagsex_2019_grouped,
by = c("SUBZONE_N" = "SZ"))
# calculating the population density on the heatmap data to be plotted
heatmap_data <- heatmap_data %>% mutate(Population_Density = TOTAL / SHAPE_Area * 1e6)
# filtering the relevant columns to be used for plotting
heatmap_data_filtered <- heatmap_data[-c(1:2)]
# plotting
tmap_mode("view")
tm_popden <-
tm_shape(heatmap_data_filtered) +
tm_polygons("Population_Density",
style = "quantile",
palette = "Pastel1") +
tm_layout(title = 'Singapore Population Density by Subzones and Planning Areas') +
tm_credits("Data source: www.singstat.gov.sg")
tm_popden