The objective of this visualisation is to show the spatial income distribution by planning area on the Singapore map based on Master Plan 2014 Planning Area Boundary. There are 18 income bands (e.g. below $1,000, $1,000 - $1,999 etc.) which will be grouped into 5 income groups (Lower, Middle Lower, Middle, Middle Upper, Upper) based on 20 percentiles. These will be reflected in the interactive map as well.
Sketch:
knitr::include_graphics("image/Map.jpg")
The map will show the median income band of the planning areas using colour gradient. The darker the colour, the higher the income band. Viewers will be able to click on the individual planning areas to view further details such as:
- median income group which the median income band belongs to
- percentage breakdown across the income groups With these details, viewers will be able to understand the income distribution within each planning area as well as across the various planning areas.
Load the required packages for the creation of the map.
packages = c('sf', 'tmap', 'tidyverse','plotly')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: sf
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
## Loading required package: tmap
## Loading required package: tidyverse
## ── Attaching packages ──────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: plotly
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Two datasets will be used to create the map, they are:
- URA Master Plan Planning Area boundary in shapefile format (i.e.MP14_PLNG_AREA_WEB_PL)
- Resident households by planning area & income group
mpPA <- st_read(dsn = "data",
layer = "MP14_PLNG_AREA_WEB_PL")
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `/Users/jessicalee/Google Drive (peizhi.lee.2019@smu.edu.sg)/Visual Analytics/Week 9 14 Mar/Makeover 9/Makeover 09/data' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
incomeDist <- readxl::read_excel("data/Income distribution.xlsx")
Based on 20th percentiles of the total population, the following income bands are categorised into the respective income groups:
Lower: Below $1,000, $1,000 - $1,999, $2,000 - $2,999, $3,000 - $3,999
Middle Lower: $4,000 - $4,999, $5,000 - $5,999, $6,000 - $6,999
Middle: $7,000 - $7,999, $8,000 - $8,999, $9,000 - $9,999, $10,000 - $10,999
Middle Upper: $11,000 - $11,999, $12,000 - $12,999, $13,000 - $13,999, $14,000 - $14,999, $15,000 - $17,499
Upper: $17,500 - $19,999, $20,000 & Over
incomeDistGrp <- incomeDist %>%
mutate('Lower' =rowSums(.[4:7])) %>%
mutate('Middle Lower' =rowSums(.[8:10])) %>%
mutate('Middle' = rowSums(.[11:14])) %>%
mutate('Middle Upper' =rowSums(.[15:19])) %>%
mutate('Upper' = rowSums(.[20:21]))
For each planning area, compute the median income group into a new column.
incomeDisGrp2 <- select(incomeDistGrp, -c(2,3,22,23,24,25,26)) %>%
pivot_longer(-'Planning Area', values_to = "num", names_to = "var") %>%
group_by(`Planning Area`) %>%
mutate(cumsum = cumsum(num),
median.pos = sum(num)/2) %>%
mutate(median.cat = ifelse(median.pos <= cumsum, var, NA)) %>%
dplyr::filter(!is.na(median.cat)) %>%
slice(1)
Join the new column into the main income distribution dataset.
incomeDisGrp3 <-
select(incomeDisGrp2, c(`Planning Area`, `median.cat`))
incomeDistGrp4 <- left_join(incomeDistGrp, incomeDisGrp3,
by = "Planning Area")
Based on the median income band, map back to which income group it belongs to. The mapped income group to be in a new column.
incomeDistGrp4 ["incomeBand"] <- NA
for(i in 1:nrow(incomeDistGrp4))
incomeDistGrp4$incomeBand[i] <- if (incomeDistGrp4$median.cat[i] == "Below $1,000" | incomeDistGrp4$median.cat[i] == "$1,000 - $1,999" | incomeDistGrp4$median.cat[i] == "$2,000 - $2,999" | incomeDistGrp4$median.cat[i] == "$3,000 - $3,999") {
"Lower"} else {
if (incomeDistGrp4$median.cat[i] == "$4,000 - $4,999" | incomeDistGrp4$median.cat[i] == "$5,000 - $5,999" | incomeDistGrp4$median.cat[i] == "$6,000 - $6,999") {
"Lower Middle"
} else {
if (incomeDistGrp4$median.cat[i] == "$7,000 - $7,999" | incomeDistGrp4$median.cat[i] == "$8,000 - $8,999" | incomeDistGrp4$median.cat[i] == "$9,000 - $9,999" | incomeDistGrp4$median.cat[i] == "$10,000 - $10,999") {
"Middle"
} else {
if (incomeDistGrp4$median.cat[i] == "$11,000 - $11,999" | incomeDistGrp4$median.cat[i] == "$12,000 - $12,999" | incomeDistGrp4$median.cat[i] == "$13,000 - $13,999" | incomeDistGrp4$median.cat[i] == "$14,000 - $14,999" | incomeDistGrp4$median.cat[i] == "$15,000 - $17,499") {
"Middle Upper"
} else {
"Upper"
}
}
}
}
incomeDistGrp4 <- select(incomeDistGrp4, -c(2))
Compute the percentage breakdown for the different income groups for each planning area.
incomeDistGrp4 ["LowerPct"] <- NA
incomeDistGrp4 ["MiddleLowerPct"] <- NA
incomeDistGrp4 ["MiddlePct"] <- NA
incomeDistGrp4 ["MiddleUpperPct"] <- NA
incomeDistGrp4 ["UpperPct"] <- NA
incomeDistGrp4$LowerPct <- round((incomeDistGrp4$ Lower/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)
incomeDistGrp4$MiddleLowerPct <- round((incomeDistGrp4$'Middle Lower'/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)
incomeDistGrp4$MiddlePct <- round((incomeDistGrp4$Middle/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)
incomeDistGrp4$MiddleUpperPct <- round((incomeDistGrp4$'Middle Upper'/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)
incomeDistGrp4$UpperPct <- round((incomeDistGrp4$Upper/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)
Left join the income dataset to the mapping file with planning area as the common identifier.
mpPA_income <- left_join(mpPA, incomeDistGrp4,
by = c("PLN_AREA_N"="Planning Area"))
## Warning: Column `PLN_AREA_N`/`Planning Area` joining factor and character
## vector, coercing into character vector
mpPA_income[is.na(mpPA_income)] <- 0
head(mpPA_income)
## Simple feature collection with 6 features and 43 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 17231.1 ymin: 26375.86 xmax: 31017.98 ymax: 44223.29
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
## OBJECTID PLN_AREA_N PLN_AREA_C CA_IND REGION_N REGION_C
## 1 1 BISHAN BS N CENTRAL REGION CR
## 2 2 BUKIT BATOK BK N WEST REGION WR
## 3 3 BUKIT MERAH BM N CENTRAL REGION CR
## 4 4 BUKIT PANJANG BP N WEST REGION WR
## 5 5 BUKIT TIMAH BT N CENTRAL REGION CR
## 6 6 CENTRAL WATER CATCHMENT CC N NORTH REGION NR
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR SHAPE_Leng SHAPE_Area
## 1 BA616285F402846F 2014-12-05 28789.76 37450.89 13517.12 7618921
## 2 FB44C870B04B7F57 2014-12-05 19255.42 37527.65 15234.22 11133256
## 3 738B479882E4EE28 2014-12-05 26865.78 28662.87 29156.29 14462472
## 4 4A9C6E6BAF7BE998 2014-12-05 21287.04 38761.84 15891.85 9019940
## 5 C893AEAD20F42559 2014-12-05 23256.76 34689.00 22492.84 17526654
## 6 52D0068508B0348A 2014-12-05 24424.42 39849.05 30538.25 37147854
## No Working Person Below $1,000 $1,000 - $1,999 $2,000 - $2,999
## 1 2.6 0.5 1.3 1.1
## 2 4.6 0.6 2.4 2.4
## 3 10.8 2.1 6.1 3.6
## 4 2.3 0.5 1.7 2.2
## 5 2.9 0.3 0.7 0.4
## 6 0.0 0.0 0.0 0.0
## $3,000 - $3,999 $4,000 - $4,999 $5,000 - $5,999 $6,000 - $6,999
## 1 0.9 1.3 1.0 1.1
## 2 2.6 2.7 2.1 3.1
## 3 2.7 2.7 2.1 2.2
## 4 2.3 2.4 2.4 2.9
## 5 0.4 0.2 0.3 0.3
## 6 0.0 0.0 0.0 0.0
## $7,000 - $7,999 $8,000 - $8,999 $9,000 - $9,999 $10,000 - $10,999
## 1 1.0 1.7 1.0 0.9
## 2 2.3 2.4 1.8 2.0
## 3 2.3 1.6 1.8 1.7
## 4 2.9 2.5 2.4 2.3
## 5 0.7 0.7 0.2 0.9
## 6 0.0 0.0 0.0 0.0
## $11,000 - $11,999 $12,000 - $12,999 $13,000 - $13,999 $14,000 - $14,999
## 1 1.4 1.2 1.2 0.9
## 2 1.8 1.7 1.0 1.6
## 3 1.4 1.6 1.4 1.7
## 4 1.5 1.8 1.8 1.2
## 5 0.4 0.7 0.4 0.5
## 6 0.0 0.0 0.0 0.0
## $15,000 - $17,499 $17,500 - $19,999 $20,000 & Over Lower Middle Lower Middle
## 1 1.3 1.8 5.5 3.8 3.4 4.6
## 2 2.1 1.4 5.3 8.0 7.9 8.5
## 3 2.3 1.6 5.4 14.5 7.0 7.4
## 4 2.2 1.6 4.4 6.7 7.7 10.1
## 5 1.2 1.9 10.6 1.8 0.8 2.5
## 6 0.0 0.0 0.0 0.0 0.0 0.0
## Middle Upper Upper median.cat incomeBand LowerPct MiddleLowerPct
## 1 6.0 7.3 $11,000 - $11,999 Middle Upper 15 14
## 2 8.2 6.7 $8,000 - $8,999 Middle 20 20
## 3 8.4 7.0 $7,000 - $7,999 Middle 33 16
## 4 8.5 6.0 $8,000 - $8,999 Middle 17 20
## 5 3.2 12.5 $20,000 & Over Upper 9 4
## 6 0.0 0.0 0 0 0 0
## MiddlePct MiddleUpperPct UpperPct geometry
## 1 18 24 29 MULTIPOLYGON (((29772.19 38...
## 2 22 21 17 MULTIPOLYGON (((20294.46 39...
## 3 17 19 16 MULTIPOLYGON (((26228.63 30...
## 4 26 22 15 MULTIPOLYGON (((21448.72 41...
## 5 12 15 60 MULTIPOLYGON (((24031.39 36...
## 6 0 0 0 MULTIPOLYGON (((25073.29 43...
Assign an order to the income band categories so that it will appear in the right sequence in the map as we will use colour gradient to show the different sequential levels of income.
mpPA_income$median.cat <- factor(mpPA_income$median.cat, levels = c("$1,000 & below","$1,000 - $1,999","$2,000 - $2,999", "$3,000 - $3,999", "$4,000 - $4,999", "$5,000 - $5,999", "$6,000 - $6,999", "$7,000 - $7,999","$8,000 - $8,999", " $9,000 - $9,999", " $10,000 - $10,999","$11,000 - $11,999","$12,000 - $12,999", "$13,000 - $13,999","$14,000 - $14,999","$15,000 - $17,499","$17,500 - $19,999","$20,000 & Over"))
In the code chunk below, the colour of the planning area will be based on the median category for that particular planning area. The map is interactive as user will be able to click on the planning area and see the detailed breakdown of the median income group as well as percentage breakdown by the 5 income groups. This will show how skewed the income distribution is for the particular planning area.
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(mpPA_income) +
tm_fill(col = "median.cat",
title = "Income Bands",
palette = "Blues",
id = "PLN_AREA_N",
popup.vars = c("Median Income Band" = "median.cat", "Median Income Group" = "incomeBand", "Lower%" ="LowerPct","MiddleLower%" ="MiddleLowerPct","Middle%" ="MiddlePct","MiddleUpper%" ="MiddleUpperPct","Upper%" ="UpperPct"),
colorNA = NULL) +
tm_borders(alpha = 0.5) +
tm_layout(title = "Singapore Income Distribution by Planning Area",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
## Warning: The shape mpPA_income is invalid. See sf::st_is_valid
To better visualise the income group breakdown in chart form for each planning area. An interactive faceted bar chart is generated to show the skewness of income distribution for each planning area. We will be able to observe which planning areas have higher population belonging to the higher income group vs. lower income group.
IncBands <- select(incomeDistGrp4, c(`Planning Area`,`Lower`,`Middle Lower`,`Middle`,`Middle Upper`,`Upper`))
IncBands <- IncBands[-c(1), ]
IncBands <- gather(IncBands, "IncomeGrp","Population",2:6)
IncBands$IncomeGrp <- factor(IncBands$IncomeGrp, levels = c("Lower","Middle Lower","Middle","Middle Upper","Upper"))
IBplots<- ggplot(IncBands, aes(x = IncomeGrp, y = Population, fill = IncomeGrp)) +
geom_bar(stat = "identity") +
scale_fill_brewer("Income Group") +
scale_y_discrete(name="Population (in thousands)") +
scale_x_discrete(name="") +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.text.x = element_text(size = 7)) +
facet_wrap(vars(`Planning Area`))
ggplotly(IBplots)
Based on the interactive map and bar charts in section 4.0 & 5.0, we observe the following:
1. Prime locations with more landed properties like Bukit Timah & Tanglin’s median income band is $20,000 & over while older estates that consist mainly of older HDBs such as Queenstown, Bukit merah and especially Outram have lower median income bands.
2. The bar chart allows us to also compare how populus each planning area is. This gives us perspective on the absolute numbers under each income group. For example, Bedok and Tampines are highly dense areas and it is interesting to note that the income group distribution is inversed. This is probably because Bedok has more landed and private condominums than Tampines.
3. For non-mature estates, we can observe that areas closer to the city reflect a higher proportion of “middle upper” and “upper” income groups. For example, Sengkang shows a higher proportion of “middle upper” and “upper” income group compared to punggol.
1. Interactivity allows users to view greater detail for specific areas of the map / chart whereas static will probably need multiple visuals put together on the same page to deliver the same message.
2. Interactivity allows for customisation of the popup details according to what the users would be interested in whereas static would not allow such customisation as it does not have popup function.
3. Interactivity allows for multi-dimension detail of information whereas static would be constrained by the single chart / map’s 2D format limitations.
Ideally, it would be better if the respective bar chart could appear when user hovers or clicks on the various planning area on the interactive map. However, tmap does not allow for popup charts but only allow for popup variables. A quick browse of the possible R packages that would allow for such visual would be highcharter. An example here: http://jkunst.com/blog/posts/2019-02-04-using-tooltips-in-unexpected-ways/