1.0 Overview of Makeover 09

1.1 Objective of proposed data visualisation

The objective of this visualisation is to show the spatial income distribution by planning area on the Singapore map based on Master Plan 2014 Planning Area Boundary. There are 18 income bands (e.g. below $1,000, $1,000 - $1,999 etc.) which will be grouped into 5 income groups (Lower, Middle Lower, Middle, Middle Upper, Upper) based on 20 percentiles. These will be reflected in the interactive map as well.

Sketch:

knitr::include_graphics("image/Map.jpg")

1.2 Incorporating Interactivity

The map will show the median income band of the planning areas using colour gradient. The darker the colour, the higher the income band. Viewers will be able to click on the individual planning areas to view further details such as:
- median income group which the median income band belongs to
- percentage breakdown across the income groups With these details, viewers will be able to understand the income distribution within each planning area as well as across the various planning areas.

2.0 Load the required packages

Load the required packages for the creation of the map.

packages = c('sf', 'tmap', 'tidyverse','plotly')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}
## Loading required package: sf
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
## Loading required package: tmap
## Loading required package: tidyverse
## ── Attaching packages ──────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Loading required package: plotly
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

3.0 Geospatial Data Wrangling

3.1 The data

Two datasets will be used to create the map, they are:
- URA Master Plan Planning Area boundary in shapefile format (i.e.MP14_PLNG_AREA_WEB_PL)
- Resident households by planning area & income group

mpPA <- st_read(dsn = "data", 
                layer = "MP14_PLNG_AREA_WEB_PL")
## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `/Users/jessicalee/Google Drive (peizhi.lee.2019@smu.edu.sg)/Visual Analytics/Week 9 14 Mar/Makeover 9/Makeover 09/data' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
incomeDist <- readxl::read_excel("data/Income distribution.xlsx")

3.2 Data Preparation

3.2.1 Create Income Groups

Based on 20th percentiles of the total population, the following income bands are categorised into the respective income groups:
Lower: Below $1,000, $1,000 - $1,999, $2,000 - $2,999, $3,000 - $3,999
Middle Lower: $4,000 - $4,999, $5,000 - $5,999, $6,000 - $6,999
Middle: $7,000 - $7,999, $8,000 - $8,999, $9,000 - $9,999, $10,000 - $10,999
Middle Upper: $11,000 - $11,999, $12,000 - $12,999, $13,000 - $13,999, $14,000 - $14,999, $15,000 - $17,499
Upper: $17,500 - $19,999, $20,000 & Over

incomeDistGrp <- incomeDist %>%
  mutate('Lower' =rowSums(.[4:7])) %>%
  mutate('Middle Lower' =rowSums(.[8:10])) %>%
  mutate('Middle' = rowSums(.[11:14])) %>%
  mutate('Middle Upper' =rowSums(.[15:19])) %>%
  mutate('Upper' = rowSums(.[20:21]))

3.2.2 Compute Median Income Group per Planning Area

For each planning area, compute the median income group into a new column.

incomeDisGrp2 <- select(incomeDistGrp, -c(2,3,22,23,24,25,26)) %>%
  pivot_longer(-'Planning Area', values_to = "num", names_to = "var") %>%
  group_by(`Planning Area`) %>%
  mutate(cumsum = cumsum(num),
         median.pos = sum(num)/2) %>%
  mutate(median.cat = ifelse(median.pos <= cumsum, var, NA)) %>%
  dplyr::filter(!is.na(median.cat)) %>%
  slice(1)

3.2.3 Join the Median Income Band column to the main dataset

Join the new column into the main income distribution dataset.

incomeDisGrp3 <-
  select(incomeDisGrp2, c(`Planning Area`, `median.cat`))

incomeDistGrp4 <- left_join(incomeDistGrp, incomeDisGrp3,
                           by = "Planning Area")

3.2.4 Categorise Median Income Group

Based on the median income band, map back to which income group it belongs to. The mapped income group to be in a new column.

incomeDistGrp4 ["incomeBand"] <- NA

for(i in 1:nrow(incomeDistGrp4))
  incomeDistGrp4$incomeBand[i] <- if (incomeDistGrp4$median.cat[i] == "Below $1,000" | incomeDistGrp4$median.cat[i] == "$1,000 - $1,999" | incomeDistGrp4$median.cat[i] == "$2,000 - $2,999" | incomeDistGrp4$median.cat[i] == "$3,000 - $3,999") {
    "Lower"} else {
    if (incomeDistGrp4$median.cat[i] == "$4,000 - $4,999" | incomeDistGrp4$median.cat[i] == "$5,000 - $5,999" | incomeDistGrp4$median.cat[i] == "$6,000 - $6,999") {
      "Lower Middle"
    } else {
      if (incomeDistGrp4$median.cat[i] == "$7,000 - $7,999" | incomeDistGrp4$median.cat[i] == "$8,000 - $8,999" | incomeDistGrp4$median.cat[i] == "$9,000 - $9,999" | incomeDistGrp4$median.cat[i] == "$10,000 - $10,999") {
      "Middle"
      } else {
      if (incomeDistGrp4$median.cat[i] == "$11,000 - $11,999" | incomeDistGrp4$median.cat[i] == "$12,000 - $12,999" | incomeDistGrp4$median.cat[i] == "$13,000 - $13,999" | incomeDistGrp4$median.cat[i] == "$14,000 - $14,999" | incomeDistGrp4$median.cat[i] == "$15,000 - $17,499") {
      "Middle Upper"
      } else {
        "Upper"
    }
  }
}
    }

incomeDistGrp4 <- select(incomeDistGrp4, -c(2))

3.2.5 Add percentage columns for the different Income Groups for each planning area

Compute the percentage breakdown for the different income groups for each planning area.

incomeDistGrp4 ["LowerPct"] <- NA
incomeDistGrp4 ["MiddleLowerPct"] <- NA
incomeDistGrp4 ["MiddlePct"] <- NA
incomeDistGrp4 ["MiddleUpperPct"] <- NA
incomeDistGrp4 ["UpperPct"] <- NA

incomeDistGrp4$LowerPct <- round((incomeDistGrp4$ Lower/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)

incomeDistGrp4$MiddleLowerPct <- round((incomeDistGrp4$'Middle Lower'/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)

incomeDistGrp4$MiddlePct <- round((incomeDistGrp4$Middle/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)

incomeDistGrp4$MiddleUpperPct <- round((incomeDistGrp4$'Middle Upper'/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)

incomeDistGrp4$UpperPct <- round((incomeDistGrp4$Upper/rowSums(incomeDistGrp4[,21:25]))*100,digits=0)

3.2.6 Join mapping file with the Income Group dataset

Left join the income dataset to the mapping file with planning area as the common identifier.

mpPA_income <- left_join(mpPA, incomeDistGrp4,
                         by = c("PLN_AREA_N"="Planning Area"))
## Warning: Column `PLN_AREA_N`/`Planning Area` joining factor and character
## vector, coercing into character vector
mpPA_income[is.na(mpPA_income)] <- 0

head(mpPA_income)
## Simple feature collection with 6 features and 43 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 17231.1 ymin: 26375.86 xmax: 31017.98 ymax: 44223.29
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
##   OBJECTID              PLN_AREA_N PLN_AREA_C CA_IND       REGION_N REGION_C
## 1        1                  BISHAN         BS      N CENTRAL REGION       CR
## 2        2             BUKIT BATOK         BK      N    WEST REGION       WR
## 3        3             BUKIT MERAH         BM      N CENTRAL REGION       CR
## 4        4           BUKIT PANJANG         BP      N    WEST REGION       WR
## 5        5             BUKIT TIMAH         BT      N CENTRAL REGION       CR
## 6        6 CENTRAL WATER CATCHMENT         CC      N   NORTH REGION       NR
##            INC_CRC FMEL_UPD_D   X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
## 1 BA616285F402846F 2014-12-05 28789.76 37450.89   13517.12    7618921
## 2 FB44C870B04B7F57 2014-12-05 19255.42 37527.65   15234.22   11133256
## 3 738B479882E4EE28 2014-12-05 26865.78 28662.87   29156.29   14462472
## 4 4A9C6E6BAF7BE998 2014-12-05 21287.04 38761.84   15891.85    9019940
## 5 C893AEAD20F42559 2014-12-05 23256.76 34689.00   22492.84   17526654
## 6 52D0068508B0348A 2014-12-05 24424.42 39849.05   30538.25   37147854
##   No Working Person Below $1,000 $1,000 - $1,999 $2,000 - $2,999
## 1               2.6          0.5             1.3             1.1
## 2               4.6          0.6             2.4             2.4
## 3              10.8          2.1             6.1             3.6
## 4               2.3          0.5             1.7             2.2
## 5               2.9          0.3             0.7             0.4
## 6               0.0          0.0             0.0             0.0
##   $3,000 - $3,999 $4,000 - $4,999 $5,000 - $5,999 $6,000 - $6,999
## 1             0.9             1.3             1.0             1.1
## 2             2.6             2.7             2.1             3.1
## 3             2.7             2.7             2.1             2.2
## 4             2.3             2.4             2.4             2.9
## 5             0.4             0.2             0.3             0.3
## 6             0.0             0.0             0.0             0.0
##   $7,000 - $7,999 $8,000 - $8,999 $9,000 - $9,999 $10,000 - $10,999
## 1             1.0             1.7             1.0               0.9
## 2             2.3             2.4             1.8               2.0
## 3             2.3             1.6             1.8               1.7
## 4             2.9             2.5             2.4               2.3
## 5             0.7             0.7             0.2               0.9
## 6             0.0             0.0             0.0               0.0
##   $11,000 - $11,999 $12,000 - $12,999 $13,000 - $13,999 $14,000 - $14,999
## 1               1.4               1.2               1.2               0.9
## 2               1.8               1.7               1.0               1.6
## 3               1.4               1.6               1.4               1.7
## 4               1.5               1.8               1.8               1.2
## 5               0.4               0.7               0.4               0.5
## 6               0.0               0.0               0.0               0.0
##   $15,000 - $17,499 $17,500 - $19,999 $20,000 & Over Lower Middle Lower Middle
## 1               1.3               1.8            5.5   3.8          3.4    4.6
## 2               2.1               1.4            5.3   8.0          7.9    8.5
## 3               2.3               1.6            5.4  14.5          7.0    7.4
## 4               2.2               1.6            4.4   6.7          7.7   10.1
## 5               1.2               1.9           10.6   1.8          0.8    2.5
## 6               0.0               0.0            0.0   0.0          0.0    0.0
##   Middle Upper Upper        median.cat   incomeBand LowerPct MiddleLowerPct
## 1          6.0   7.3 $11,000 - $11,999 Middle Upper       15             14
## 2          8.2   6.7   $8,000 - $8,999       Middle       20             20
## 3          8.4   7.0   $7,000 - $7,999       Middle       33             16
## 4          8.5   6.0   $8,000 - $8,999       Middle       17             20
## 5          3.2  12.5    $20,000 & Over        Upper        9              4
## 6          0.0   0.0                 0            0        0              0
##   MiddlePct MiddleUpperPct UpperPct                       geometry
## 1        18             24       29 MULTIPOLYGON (((29772.19 38...
## 2        22             21       17 MULTIPOLYGON (((20294.46 39...
## 3        17             19       16 MULTIPOLYGON (((26228.63 30...
## 4        26             22       15 MULTIPOLYGON (((21448.72 41...
## 5        12             15       60 MULTIPOLYGON (((24031.39 36...
## 6         0              0        0 MULTIPOLYGON (((25073.29 43...

3.2.7 Add order to the income bands

Assign an order to the income band categories so that it will appear in the right sequence in the map as we will use colour gradient to show the different sequential levels of income.

mpPA_income$median.cat <- factor(mpPA_income$median.cat, levels = c("$1,000 & below","$1,000 - $1,999","$2,000 - $2,999", "$3,000 - $3,999", "$4,000 - $4,999", "$5,000 - $5,999", "$6,000 - $6,999", "$7,000 - $7,999","$8,000 - $8,999", " $9,000 - $9,999", " $10,000 - $10,999","$11,000 - $11,999","$12,000 - $12,999", "$13,000 - $13,999","$14,000 - $14,999","$15,000 - $17,499","$17,500 - $19,999","$20,000 & Over"))

4.0 Choropleth Mapping Geospatial Data Using tmap

4.1 Using tmap to build an interactive map

In the code chunk below, the colour of the planning area will be based on the median category for that particular planning area. The map is interactive as user will be able to click on the planning area and see the detailed breakdown of the median income group as well as percentage breakdown by the 5 income groups. This will show how skewed the income distribution is for the particular planning area.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(mpPA_income) +
  tm_fill(col = "median.cat",
          title = "Income Bands",
          palette = "Blues",
          id = "PLN_AREA_N",
          popup.vars = c("Median Income Band" = "median.cat", "Median Income Group" = "incomeBand", "Lower%" ="LowerPct","MiddleLower%" ="MiddleLowerPct","Middle%" ="MiddlePct","MiddleUpper%" ="MiddleUpperPct","Upper%" ="UpperPct"),
          colorNA = NULL) +
  tm_borders(alpha = 0.5) +
  tm_layout(title = "Singapore Income Distribution by Planning Area",
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE)
## Warning: The shape mpPA_income is invalid. See sf::st_is_valid

5.0 Interactive bar chart showing income distribution for each planning area

To better visualise the income group breakdown in chart form for each planning area. An interactive faceted bar chart is generated to show the skewness of income distribution for each planning area. We will be able to observe which planning areas have higher population belonging to the higher income group vs. lower income group.

IncBands <- select(incomeDistGrp4, c(`Planning Area`,`Lower`,`Middle Lower`,`Middle`,`Middle Upper`,`Upper`))
                   
IncBands <- IncBands[-c(1), ]

IncBands <- gather(IncBands, "IncomeGrp","Population",2:6)

IncBands$IncomeGrp <- factor(IncBands$IncomeGrp, levels = c("Lower","Middle Lower","Middle","Middle Upper","Upper"))
                   
IBplots<- ggplot(IncBands, aes(x = IncomeGrp, y = Population, fill = IncomeGrp)) + 
  geom_bar(stat = "identity") +
  scale_fill_brewer("Income Group") +
  scale_y_discrete(name="Population (in thousands)") +
  scale_x_discrete(name="") +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text.x = element_text(size = 7)) +
  facet_wrap(vars(`Planning Area`))

ggplotly(IBplots)

6.0 Insights: 3 Useful information revealed by data visualisation

Based on the interactive map and bar charts in section 4.0 & 5.0, we observe the following:
1. Prime locations with more landed properties like Bukit Timah & Tanglin’s median income band is $20,000 & over while older estates that consist mainly of older HDBs such as Queenstown, Bukit merah and especially Outram have lower median income bands.
2. The bar chart allows us to also compare how populus each planning area is. This gives us perspective on the absolute numbers under each income group. For example, Bedok and Tampines are highly dense areas and it is interesting to note that the income group distribution is inversed. This is probably because Bedok has more landed and private condominums than Tampines.
3. For non-mature estates, we can observe that areas closer to the city reflect a higher proportion of “middle upper” and “upper” income groups. For example, Sengkang shows a higher proportion of “middle upper” and “upper” income group compared to punggol.

7.0 Reflection: 3 Advantages of incorporating interactivity vs. static visual


1. Interactivity allows users to view greater detail for specific areas of the map / chart whereas static will probably need multiple visuals put together on the same page to deliver the same message.
2. Interactivity allows for customisation of the popup details according to what the users would be interested in whereas static would not allow such customisation as it does not have popup function.
3. Interactivity allows for multi-dimension detail of information whereas static would be constrained by the single chart / map’s 2D format limitations.

8.0 Future enhancements to the interactive map

Ideally, it would be better if the respective bar chart could appear when user hovers or clicks on the various planning area on the interactive map. However, tmap does not allow for popup charts but only allow for popup variables. A quick browse of the possible R packages that would allow for such visual would be highcharter. An example here: http://jkunst.com/blog/posts/2019-02-04-using-tooltips-in-unexpected-ways/