Inspired by this article, the objective of this data visualisation is to explore the disparity between planning areas in Singapore.
The datasets used are:
The first main challenge was finding a way to aggregate the available data on monthly household income in each planning area. The data obtained from SingStat is split by income ranges such as “Below $1,000”, “$1,000 - $1,999” …, but I wanted a single value that captures the income of the whole planning area. One option was to find the median household income, while another option was to calculate the mean household income. In section 5.4, I will explain how I calculated the mean household income.
The planning areas in the GHS 2015 dataset refer to areas demarcated in the URA’s Master Plan 2014, which is what I use in this visualisation. However, some planning areas have no data, which leaves gaps in our analysis. One possible reason could be that these planning areas do not have many resident households, such as Tuas and Western Water Catchment.
design sketch
packages = c('sf', 'tmap', 'tidyverse', 'treemap', 'highcharter')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: sf
## Warning: package 'sf' was built under R version 3.6.3
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
## Loading required package: tmap
## Warning: package 'tmap' was built under R version 3.6.3
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'readr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'stringr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: treemap
## Warning: package 'treemap' was built under R version 3.6.3
## Loading required package: highcharter
## Warning: package 'highcharter' was built under R version 3.6.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
if (!require('devtools', character.only= T)){
install.packages('devtools')
}
## Loading required package: devtools
## Warning: package 'devtools' was built under R version 3.6.3
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 3.6.3
require(devtools)
install_github("timelyportfolio/d3treeR")
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install Rtools 3.5 from https://cran.r-project.org/bin/windows/Rtools/.
## Skipping install of 'd3treeR' from a github remote, the SHA1 (ebb833db) has not changed since last install.
## Use `force = TRUE` to force installation
library(d3treeR)
Use the st_read() function of sf package to import MP14_PLNG_AREA_WEB_PL shapefile into R as a simple feature data frame called mp.
mp <- st_read(dsn = "data",
layer = "MP14_PLNG_AREA_WEB_PL")
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `C:\Users\Leandra Faith Lee\Downloads\IS428 Visual Analytics for Business Intelligence\Assignment5\data' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
Use the read_csv() function to import the csv file into R. The first row contains the total data for all planning areas, so I filter to remove this row. Create a new column to store the percentage of households who earn more than $20,000 a month, to be used for the treemap visualisation later. Change all the planning area names to uppercase, so that it can be joined with the map data later.
monthly_hh_income <- read_csv("data/monthly_household_income.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## `Planning Area` = col_character(),
## Total = col_number(),
## `$13,000 - $13,999` = col_character(),
## `$14,000 - $14,999` = col_character(),
## `$17,500 - $19,999` = col_character()
## )
## i Use `spec()` for the full column specifications.
monthly_hh_income <- monthly_hh_income %>%
filter(`Planning Area` != "Total") %>%
mutate(`prop_rich` = (`$20,000 & Over`/Total)*100) %>%
mutate_at(.vars = vars(`Planning Area`), toupper)
# monthly_hh_income
For the treemap visualisation, I want to show the hierarchical structure of regions and planning areas. However, the monthly household income dataset does not have a column for Region, so I have to obtain this using a join with the map data. Since the geometry is not needed for treemap, I can drop it using the st_drop_geometry function.
mp_pa_region <- select(mp, c(`PLN_AREA_N`, `REGION_N` ))
region_hh_income <- left_join(mp_pa_region, monthly_hh_income,
by = c("PLN_AREA_N" = "Planning Area"))
region_hh_income <- st_drop_geometry(region_hh_income)
# region_hh_income
Filter the rows of data with “NA”.
region_hh_income_nonNA <- region_hh_income %>%
filter(Total != "NA")
# region_hh_income_nonNA
Use the treemap package to create a static treemap.
static_tree <- treemap(region_hh_income_nonNA,
index = c("REGION_N", "PLN_AREA_N"),
vSize = "Total",
vColor = "prop_rich",
type = "value",
fontsize.labels=12,
title="Percentage of households earning at least S$20,000 a month",
fontsize.title=16,
fontsize.legend=16,
title.legend="Percentage of households (%)")
# help(treemap)
Use the d3tree package to create an interactive treemap based on the static treemap above.
inter_treemap <- d3tree(static_tree, rootname="Percentage of households earning at least S$20,000 a month")
inter_treemap
First, load the data.
indiv_income <- read_csv("data/working-persons-monthly-income.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## year = col_double(),
## level_1 = col_character(),
## level_3 = col_character(),
## value = col_double()
## )
# indiv_income
In this visualisation, I want to compare the proportion of residents whose monthly income falls in the highest and lowest income brackets across the planning areas. Thus, I filter for “Below $1,000” and “$12,000 & Over”. Create a new column ‘sum’ to store the total number of residents who earn less than $1,000 and more than $12,000. Create a new column to store the respective percentages.
income_bar <- indiv_income %>% filter(level_1 == "Below $1,000" | level_1 == "$12,000 & Over")
income_bar$sum <- ave(income_bar$value, income_bar$level_3, FUN=sum)
income_bar <- income_bar %>%
rename("PlanningArea" = level_3, "IncomeRange" = level_1) %>%
mutate(`Percentage` = `value`/`sum` * 100 )
# income_bar
Use the highcharter package to create an interactive stacked bar plot.
options(highcharter.theme = hc_theme_smpl(tooltip = list(valueDecimals = 2)))
stacked <- income_bar %>%
hchart(
'bar', hcaes(x = 'PlanningArea', y = 'Percentage', group = 'IncomeRange'),
stacking = "normal"
) %>%
hc_colors(c("#0073C2FF", "#EFC000FF"))
stacked
First, replace the dashes (‘-’) with 0. For each income range, I multiplied the number of households with the middle value of the income range. For example, the middle value for the income range “$1,000 - $1,999” is $1500.
For the lowest income range, I use 500 as the income level, while I use 22,000 for the highest income range since no upper bound is given.
I then sum the total income for all income ranges and divide this by the total number of households to get the mean monthly household income.
This method may not be very accurate, but it serves as an estimate of the mean monthly household income based on the data provided.
monthly_hh_income_nodash <- monthly_hh_income %>%
mutate_at(vars(2:22), list(~as.numeric(str_replace_all(.,'-','0'))))
monthly_hh_income_nodash <- monthly_hh_income_nodash %>% mutate(`subtotal` = `Below $1,000` * 500 +
`$1,000 - $1,999` * 1500 +
`$2,000 - $2,999` * 2500 +
`$3,000 - $3,999` * 3500 +
`$4,000 - $4,999` * 4500 +
`$5,000 - $5,999` * 5500 +
`$6,000 - $6,999` * 6500 +
`$7,000 - $7,999` * 7500 +
`$8,000 - $8,999` * 8500 +
`$9,000 - $9,999` * 9500 +
`$10,000 - $10,999` * 10500 +
`$11,000 - $11,999` * 11500 +
`$12,000 - $12,999` * 12500 +
`$13,000 - $13,999` * 13500 +
`$14,000 - $14,999` * 14500 +
`$15,000 - $17,499` * 16250 +
`$17,500 - $19,999` * 18750 +
`$20,000 & Over` * 22000
)
monthly_hh_income_nodash <- monthly_hh_income_nodash %>% mutate(`mean_income` = `subtotal`/`Total`)
# monthly_hh_income_nodash
mp_income <- left_join(mp, monthly_hh_income_nodash,
by = c("PLN_AREA_N" = "Planning Area"))
The number of divorced/separated residents can give some insight into the number of vulnerable households in the planning area, so I will plot this on top of the choropleth map.
After loading the data, the Divorced/Separated column is read as a chr datatype, so I use the as.numeric() function to convert it to numeric data.
marital_status <- read_csv("data/marital_status.csv")
## Warning: Missing column names filled in: 'X3' [3], 'X4' [4], 'X6' [6], 'X7' [7],
## 'X9' [9], 'X10' [10], 'X12' [12], 'X13' [13], 'X15' [15], 'X16' [16]
##
## -- Column specification --------------------------------------------------------
## cols(
## `Planning Area` = col_character(),
## Total = col_character(),
## X3 = col_character(),
## X4 = col_character(),
## Single = col_character(),
## X6 = col_character(),
## X7 = col_character(),
## Married = col_character(),
## X9 = col_character(),
## X10 = col_character(),
## Widowed = col_character(),
## X12 = col_character(),
## X13 = col_character(),
## `Divorced/Separated` = col_character(),
## X15 = col_character(),
## X16 = col_character()
## )
divorce <- marital_status %>%
filter(`Planning Area` != "Total" & `Planning Area` != "NA") %>%
rename("Divorced" = `Divorced/Separated`) %>%
mutate(`prop_divorce` = as.numeric(`Divorced`)/as.numeric(`Total`) * 100 ) %>%
mutate_at(.vars = vars(`Planning Area`), toupper)
divorce <- divorce %>%
mutate(`Divorced` = as.numeric(`Divorced`))
# divorce
mp_all <- left_join(mp_income, divorce,
by = c("PLN_AREA_N" = "Planning Area"))
# mp_all
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(mp_all)+
tm_fill("mean_income",
title = "Mean HH Income",
popup.vars=c("Total Households (Thousands)"="Total.x",
"Mean Household Income (S$)"="mean_income",
"% of Total Residents Divorced/Separated" = "prop_divorce", "Total Divorced/Separated (Thousands)" = "Divorced"
),
style = "quantile",
id=c("PLN_AREA_N"),
palette = "Blues") +
tm_borders(alpha = 0.5) +
tm_bubbles("prop_divorce",
size = "Divorced",
palette = "Greens",
border.col = "black",
popup.vars = c("% of Total Residents Divorced/Separated" = "prop_divorce", "Total Divorced/Separated (Thousands)" = "Divorced"),
id=c("PLN_AREA_N")) +
tm_text("PLN_AREA_C", size = "AREA") +
tm_legend(outside = TRUE) +
tm_layout(frame = FALSE) +
tm_view(view.legend.position = c("right", "bottom"))
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
## Legend for symbol sizes not available in view mode.
From the treemap, we can see that planning areas in the central region tend to have a higher percentage of households earning at least S$20,000 a month, which is the highest income bracket. In addition, this percentage is eight to ten times higher for Bukit Timah and Tanglin compared to Punggol and Woodlands.
The stacked bar plot shows that planning areas such as Woodlands and Yishun have more than two times as many residents earning below $1,000 as compared to residents earning more than $12,000, and this proportion is reversed for planning areas like Bukit Timah and Tanglin.
Based on the map, planning areas with a lower mean household income (indicated by the light blue colour) tend to have a higher percentage of residents who are divorced or separated (indicated by the darker green circle). The reverse is also true, where planning areas with a higher mean household income have fewer residents who are divorced or separated.
These findings show that there is indeed a disparity between planning areas in Singapore, both in terms of economic and non-economic indicators (such as marital status). This adds another dimension to the discussion on inequality in Singapore, and suggests that more can be done to reduce the disparity in economic and social outcomes between neighbourhoods.