1 Introduction

The aim of this visualisation is to better understand transportation patterns of residents in Singapore, and investigate how the main mode of transport to work (private vs. public transport) differs spatially across different planning areas. The spatial distribution of public transport infrastructure such as bus stops and train stations will also be analysed in union with work travel mode choice, to enable a more comprehensive analysis.

Studying the main mode of transport to work across different areas will enable urban planners to better understand the travel behaviour of residents, especially during peak hours where residents travel to and from work. This is especially useful for more evidence-based decisions to be made in land and transportation planning.

In addition, as Singapore moves towards a more sustainable and car-lite nation, increasing emphasis is being placed on encouraging the use of public transport like buses and trains, as opposed to private vehicles. As such, analysing the varying levels of use of public and private transportation across the different planning areas will enable government agencies to pinpoint areas of improvement and develop appropriate policies.

The Singapore government conducts a General Household Survey (GHS) every mid-decade. This data is made publicly available by Singapore’s Department of Statistics (https://www.singstat.gov.sg/) as part of Singapore’s open data initiative. As data is in tabular format which is difficult to understand, this visualisation will utilise data from the latest General Household Survey in 2015 to draw insights.

Modes of transport will be discussed and visualised as three main categories:

Travel by private transport: defined as travel by car, taxi and motorcycle
Travel by public transport: defined as travel by MRT, public bus, or both
Travel by other modes of transport (e.g. private chartered bus/van, lorry/pickup)

2 Data

Data from a variety of sources will be utilised for this visualisation. The table below outlines the data sets that are used.

Data	Description	Format	Source
Planning Area	Planning area boundary in Singapore based on Master Plan 2014	Shapefile	data.gov.sg
Mode of Tranport to Work	Data extracted from General Household Survey 2015, on resident working persons aged 15 years and over by planning area and usual mode of transport to work (T146)	CSV	Department of Statistics Singapore
Bus Stops	Bus stop locations	Shapefile	LTA DataMall
Train Stations	Locations of MRT stations	Shapefile	LTA DataMall

3 Data and Design Challenges

There are several data and design challenges faced in analysing modes of transport to work and designing the data visualisation. This section outlines these challenges as well as the proposed suggestions to overcome the challenges.

3.1 Challenge 1: Extracting data

Challenge

The raw data on mode of transport to work from GHS 2015 cannot be used directly as it is in separated excel tables. Hence this data needs to be extracted and prepared in a format that can be read by R as a data table.

Mitigation

Data from GHS 2015 is consolidated into a CSV file, which is then imported by R as a data frame.

3.2 Challenge 2: Visualising main modes of transport spatially across planning areas

Challenge

While the data from GHS 2015 provides information on the modes of transport to work for the different planning areas, the data is an aspatial data and cannot be mapped spatially. However, there is much value in visualising this information spatially, especially since the data has already been broken down into the different geographical units (planning areas). Spatial mapping will enable a better visual understanding of how main modes of transport to work vary spatially.

Mitigation

Since the planning areas used in the survey is based on URA’s Master Plan 2014, the spatial dataset containing indicative polygon boundaries of planning areas in Singapore from Master Plan 2014 is retrieved from data.gov.sg. This spatial data is then joined with the attribute data from GHS 2015, allowing the spatial mapping of attribute information on main modes of transport across planning areas.

3.3 Challenge 3: Analysing availablility of public transport infrastructure in union with mode of transport to work

Challenge

When analysing public transport as a main mode of transport, it will be useful to analyse the level of public transport travel in the different planning areas in union with public transport infrastructure that is available in the planning areas. Therefore, the information on the availability of public transport infrastructure will also be visualised spatially. However, this data is not available in the modes of transport data.

Mitigation

To bridge the data gap, location data on bus stops and train stations are retrieved from LTA datamall as shapefiles. To visualise the availability of public transport infrastructure in the different planning areas spatially, a measure called the Public Transport Density is developed, which calculates the total number of public transport infrastructure (i.e. bus stops and train stations) per square kilometer, for each planning area.

\[Public\ transport\ density\ (number\ per\ square\ kilometre) = \frac{Number\ of\ bus\ stops\ + Number\ of\ train\ stations}{Area\ of\ planning\ area\ (km^2)}\]

To obtain the counts of bus stops and train stations in each planning area, st_intersects() from the sf package is utilised, to count the number of points in polygon. The counts are then divided by the area of the planning area polygon, to eventually obtain the public transport density for each planning area.

In order to visualise the choice of public transport as a main transport mode, in union with the availability of public transport infrastructure, thematic mapping will be utilised. A choropleth map will be used to visualise the percentage of residents that utilise public transport as their main mode of transport, across the different planning areas in Singapore. A proportional symbol map with graduated colours visualising public transport density across the planning areas will then be overlayed on top of the choropleth map, to enable these two information to be analysed together.

3.4 Challenge 4: Enabling comparison between public and private transportation as main mode of transport

Challenge

To enable a better comparison between the usage of public and private transport as the main mode of transport to work across the planning areas, the two maps have to be placed side-by-side.

Mitigation

To allow for easy comparison, tmap_arrange() from the tmap package is used to place the two maps beside each other. Syncing is also enabled such that when the reader pans or zooms on one map, the other map moves together in sync. This will enable easy referencing, making the visualisation more reader-friendly and interactive.

3.5 Challenge 5: Visualising distribution of main modes of transport in Singapore

Challenge

The data on main modes of transport has up to 11 categories for choice of transport, namely:

Public Bus Only
MRT Only
MRT & Public Bus Only
Other Combinations of MRT or Public Bus
Taxi Only
Car Only
Private Chartered Bus/Van Only
Lorry/Pickup Only
Motorcycle/Scooter Only
Others
No Transport Required

With so many categories, it may be difficult to visualise and compare the data to derive useful insights.

Mitigation

To mitigate the problem mentioned above, the modes of transport categories will be grouped into three overarching categories for better analysis:

Public transport: categories 1 - 4
Private transport: categories 5, 6, 9
Other modes of transport: all other categories

To visualise this data to show the varying proportions of work travel mode choice for planning areas, a ternary scatter plot will be constructed. Having three overarching categories makes the data suitable for visualisation on a ternary plot.

3.6 Challenge 6: Selecting the number of data classes

Challenge

For thematic choropleth mapping, one of the most fundamental aspects is to decide on the number of data classes to be utilised, as it can dramatically change the look and message of the map. More data classes will mean less data generalisation, which is beneficial but comes at the expense of legibility and the risk of map reading errors since more colours are used. On the other hand, too little data classes may lead to the loss of important spatial patterns.

Mitigation

In order to decide on the optimal number of classes to be utilised for the visualisation, exploratory data analysis is conducted. The table below shows the summary statistics of the two variables of interest for the spatial visualisation: percentage of residents who travel by public transport as their main mode of transport, and percentage of residents who travel by private transport as their main mode of transport.

Variable	Minimum	Mean	Maximum
Percentage of public transport travel	28.10	56.52	65.60
Percentage of private transport travel	14.50	28.74	62.90

Based on the summary statistics, 5 data classes are chosen for public transport, and 6 data classes are chosen for private transport, such that data can be grouped in intervals of 10%. “Pretty” breaks will be utilised such that the bins are in whole numbers for easier interpretation.

3.7 Challenge 7: Making interactive visualisations

Challenge

When designing interactive charts, attention also has to be paid to tooltips for a better interactive experience.

Mitigation

Hence, the tooltips for the charts on mouse-over and on-click will be formatted in a reader-friendly manner, that is easy to understand and has essential information required for the reader to better interpret the charts.

For the choropleth maps, hovering over the different planning areas in the maps will display information on the name of the planning area. On click, the pop-up will display more details about the percentage of residents who travel by the particular mode of transport. For the proportional symbol map, hovering over the symbol will display the name of the planning area, while clicking in will display more information about the public transport density, number of bus stops and number of train stations in the particular planning area. As for the ternary chart, the tooltip for each data point (planning area) will display information regarding the relative compositions (percentages) of the three different modes of transport for the particular planning area, together with the planning area name.

4 Sketch of Proposed Design

5 Step-by-Step Data Visualisation

5.1 Install packages

The following packages will be installed for the creation of the data visualisation:

tidyverse: contains a collection of R packages for data importing and manipulation
sf: for spatial data manipulation
tmap: to create spatial thematic maps
plotly: to create interactive charts

packages <- c('tidyverse', 'sf', 'tmap', 'plotly')

for (p in packages) {
  if (!require(p, character.only = T)) {
    install.packages(p)
  }
  library(p, character.only = T)
}

5.2 Import data

The data sets on mode of transport, planning area, bus stops and train stations will be imported for the creation of the data visualisation.

5.2.1 Mode of transport to work

Import

transport <- read_csv('data/aspatial/ghs15_transport.csv')

## Parsed with column specification:
## cols(
##   `Planning Area` = col_character(),
##   Total = col_number(),
##   `Public Bus Only` = col_double(),
##   `MRT Only` = col_double(),
##   `MRT & Public Bus Only` = col_double(),
##   `Other Combinations of MRT or Public Bus` = col_double(),
##   `Taxi Only` = col_double(),
##   `Car Only` = col_double(),
##   `Private Chartered Bus/Van Only` = col_double(),
##   `Lorry/Pickup Only` = col_character(),
##   `Motorcycle/Scooter Only` = col_double(),
##   Others = col_double(),
##   `No Transport Required` = col_double()
## )

View content

glimpse(transport)

## Rows: 30
## Columns: 13
## $ `Planning Area`                           <chr> "Total", "Ang Mo Kio", "B...
## $ Total                                     <dbl> 2147.8, 100.9, 150.1, 49....
## $ `Public Bus Only`                         <dbl> 353.6, 16.6, 32.5, 6.2, 1...
## $ `MRT Only`                                <dbl> 257.7, 11.9, 11.9, 6.5, 1...
## $ `MRT & Public Bus Only`                   <dbl> 533.4, 26.7, 34.8, 14.0, ...
## $ `Other Combinations of MRT or Public Bus` <dbl> 115.5, 4.4, 6.0, 1.8, 3.7...
## $ `Taxi Only`                               <dbl> 28.4, 1.1, 3.0, 0.6, 0.5,...
## $ `Car Only`                                <dbl> 470.0, 20.7, 36.9, 14.3, ...
## $ `Private Chartered Bus/Van Only`          <dbl> 58.3, 2.4, 3.1, 0.7, 2.5,...
## $ `Lorry/Pickup Only`                       <chr> "42.8", "2.3", "2.0", "0....
## $ `Motorcycle/Scooter Only`                 <dbl> 73.2, 3.0, 5.2, 0.8, 2.5,...
## $ Others                                    <dbl> 32.1, 1.4, 2.1, 0.2, 0.8,...
## $ `No Transport Required`                   <dbl> 182.7, 10.3, 12.6, 3.6, 6...

5.2.2 Planning area

Import

plan_area <- st_read(dsn = 'data/geospatial', layer = 'MP14_PLNG_AREA_NO_SEA_PL')

## Reading layer `MP14_PLNG_AREA_NO_SEA_PL' from data source `C:\Users\Xiao Rong\Desktop\School\Visual Analytics for Business Intelligence\Assignments\Assignment 5\Assignment 5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

View content

glimpse(plan_area)

## Rows: 55
## Columns: 13
## $ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
## $ PLN_AREA_N <chr> "ANG MO KIO", "BEDOK", "BISHAN", "BOON LAY", "BUKIT BATO...
## $ PLN_AREA_C <chr> "AM", "BD", "BS", "BL", "BK", "BM", "BP", "BT", "GL", "K...
## $ CA_IND     <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "...
## $ REGION_N   <chr> "NORTH-EAST REGION", "EAST REGION", "CENTRAL REGION", "W...
## $ REGION_C   <chr> "NER", "ER", "CR", "WR", "WR", "CR", "WR", "CR", "CR", "...
## $ INC_CRC    <chr> "E5CBDDE0C2113055", "1719251260799DF6", "BA616285F402846...
## $ FMEL_UPD_D <date> 2016-05-11, 2016-05-11, 2016-05-11, 2016-05-11, 2016-05...
## $ X_ADDR     <dbl> 28976.88, 38582.67, 28789.76, 13410.38, 19255.42, 26865....
## $ Y_ADDR     <dbl> 40229.12, 34032.10, 37450.89, 33008.99, 37527.65, 28662....
## $ SHAPE_Leng <dbl> 17494.24, 21872.80, 13517.12, 18528.47, 15234.22, 29156....
## $ SHAPE_Area <dbl> 13941380, 21733188, 7618921, 8279408, 11133256, 14462472...
## $ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((30658.5 420..., MULTIPOLYGO...

5.2.3 Bus stops

Import

bus_stops <- st_read(dsn = 'data/geospatial', layer = 'BusStop')

## Reading layer `BusStop' from data source `C:\Users\Xiao Rong\Desktop\School\Visual Analytics for Business Intelligence\Assignments\Assignment 5\Assignment 5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 5040 features and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
## projected CRS:  SVY21

View content

glimpse(bus_stops)

## Rows: 5,040
## Columns: 4
## $ BUS_STOP_N <chr> "78221", "63359", "64141", "83139", "55231", "55351", "9...
## $ BUS_ROOF_N <chr> "B06", "B01", "B13", "B07", "B02", "B03", "B10", "B06", ...
## $ LOC_DESC   <chr> NA, "HOUGANG SWIM CPLX", "AFT JLN TELAWI", "AFT JOO CHIA...
## $ geometry   <POINT [m]> POINT (42227.96 39563.16), POINT (34065.75 39047.4...

5.2.4 Train stations

Import

train_stations <- st_read(dsn = 'data/geospatial', layer = 'MRTLRTStnPtt')

## Reading layer `MRTLRTStnPtt' from data source `C:\Users\Xiao Rong\Desktop\School\Visual Analytics for Business Intelligence\Assignments\Assignment 5\Assignment 5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 185 features and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
## projected CRS:  SVY21

View content

glimpse(train_stations)

## Rows: 185
## Columns: 4
## $ OBJECTID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,...
## $ STN_NAME <chr> "EUNOS MRT STATION", "CHINESE GARDEN MRT STATION", "KHATIB...
## $ STN_NO   <chr> "EW7", "EW25", "NS14", "NS7", "EW18", "NS5", "EW28", "EW20...
## $ geometry <POINT [m]> POINT (35782.96 33560.08), POINT (16790.75 36056.3),...

5.3 Data wrangling

Data will be pre-processed before the data visualisation can be created.

5.3.1 Define projection

For spatial data, the coordinate reference system (CRS) will have to be defined.
As the geographical area for the data is Singapore, spatial data will be projected in EPSG:3414, a projected coordinate system for Singapore.
Initial data exploration shows that the spatial data are already projected in SVY 21. Hence, no data transformation is required. The CRS of EPSG:3414 code will just be assigned to these spatial data sets.

5.3.1.1 Planning area

Assign CRS

plan_area <- st_set_crs(plan_area, 3414)

Check CRS

st_crs(plan_area)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

5.3.1.2 Bus stops

Assign CRS

bus_stops <- st_set_crs(bus_stops, 3414)

Check CRS

st_crs(bus_stops)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

5.3.1.3 Train stations

Assign CRS

train_stations <- st_set_crs(train_stations, 3414)

Check CRS

st_crs(train_stations)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

5.3.2 Handle missing data

Upon initial data checks, there are missing values in the mode of transport data set, for the category “Lorry/Pickup Only”.

The missing values for this particular mode of transport can be derived by subtracting the number of residents taking all other modes of transport, from the total number of residents in that particular planning area.

transport <- transport %>%
  mutate(`Lorry/Pickup Only` = as.double(`Lorry/Pickup Only`)) %>%
  mutate(`Lorry/Pickup Only` = ifelse(is.na(`Lorry/Pickup Only`),
                                      ifelse((Total - rowSums(.[3:9]) - rowSums(.[11:13])) < 0,
                                         0,
                                         Total - rowSums(.[3:9]) - rowSums(.[11:13])),
                                      `Lorry/Pickup Only`))

5.3.3 Data calculation I

Several fields will be calculated for the mode of transport, for usage in the data visualisation.

Percentage of residents in planning area travelling by public transport to work: The percentage of total residents whose main mode of transport to work is public transport will be mapped spatially. Hence, this percentage will have to be calculated. Travel by public transport is defined by residents who travel by:
- Public bus only
- MRT only
- MRT and public bus only
- Other combinations of MRT or public bus
Percentage of residents in planning area travelling by private transport to work: The percentage of total residents whose main mode of transport to work is private transport will also be mapped spatially. Hence, this percentage will have to be calculated. Travel by private transport is defined by residents who travel by:
- Car only
- Taxi only
- Motorcycle or scooter only
Percentage of residents in planning area travelling by other modes of transport: The percentage of total residents whose main mode of transport to work is not public or private transport will also be visualised. Hence, this percentage will have to be calculated. Other modes of transport is defined by residents who travel by:
- Private chartered bus or van only
- Lorry or pickup only
- Others
- No transport required

Calculate

transport <- transport %>%
  # Calculate percentage that travel by public transport
  mutate(perc_public = round(rowSums(.[3:6]) / Total * 100, 1)) %>%
  # Calculate percentage that travel by private transport
  mutate(perc_private = round((`Taxi Only` + `Car Only` + `Motorcycle/Scooter Only`) / Total * 100, 1))  %>%
  mutate(perc_others = round( (rowSums(.[9:10]) + rowSums(.[12:13])) / Total * 100, 1))

View content

glimpse(transport)

## Rows: 30
## Columns: 16
## $ `Planning Area`                           <chr> "Total", "Ang Mo Kio", "B...
## $ Total                                     <dbl> 2147.8, 100.9, 150.1, 49....
## $ `Public Bus Only`                         <dbl> 353.6, 16.6, 32.5, 6.2, 1...
## $ `MRT Only`                                <dbl> 257.7, 11.9, 11.9, 6.5, 1...
## $ `MRT & Public Bus Only`                   <dbl> 533.4, 26.7, 34.8, 14.0, ...
## $ `Other Combinations of MRT or Public Bus` <dbl> 115.5, 4.4, 6.0, 1.8, 3.7...
## $ `Taxi Only`                               <dbl> 28.4, 1.1, 3.0, 0.6, 0.5,...
## $ `Car Only`                                <dbl> 470.0, 20.7, 36.9, 14.3, ...
## $ `Private Chartered Bus/Van Only`          <dbl> 58.3, 2.4, 3.1, 0.7, 2.5,...
## $ `Lorry/Pickup Only`                       <dbl> 42.8, 2.3, 2.0, 0.7, 1.8,...
## $ `Motorcycle/Scooter Only`                 <dbl> 73.2, 3.0, 5.2, 0.8, 2.5,...
## $ Others                                    <dbl> 32.1, 1.4, 2.1, 0.2, 0.8,...
## $ `No Transport Required`                   <dbl> 182.7, 10.3, 12.6, 3.6, 6...
## $ perc_public                               <dbl> 58.7, 59.1, 56.8, 57.7, 5...
## $ perc_private                              <dbl> 26.6, 24.6, 30.0, 31.8, 2...
## $ perc_others                               <dbl> 14.7, 16.3, 13.2, 10.5, 1...

5.3.4 Join data

The attribute data for mode of transport to work will be joined with the spatial planning area data, such that the values for mode of transport to work can be mapped spatially across the different planning areas in Singapore.

The “Planning Area” field for the mode of transport data set will be converted to upper case such that the data table can be joined with the planning area data set based on common fields.

Data join

transport_sg <- transport %>%
  # Convert to upper case
  mutate(`Planning Area` = toupper(`Planning Area`))

# Join data sets
transport_sg <- left_join(plan_area, transport_sg, by = c('PLN_AREA_N' = 'Planning Area'))

View content

glimpse(transport_sg)

## Rows: 55
## Columns: 28
## $ OBJECTID                                  <int> 1, 2, 3, 4, 5, 6, 7, 8, 9...
## $ PLN_AREA_N                                <chr> "ANG MO KIO", "BEDOK", "B...
## $ PLN_AREA_C                                <chr> "AM", "BD", "BS", "BL", "...
## $ CA_IND                                    <chr> "N", "N", "N", "N", "N", ...
## $ REGION_N                                  <chr> "NORTH-EAST REGION", "EAS...
## $ REGION_C                                  <chr> "NER", "ER", "CR", "WR", ...
## $ INC_CRC                                   <chr> "E5CBDDE0C2113055", "1719...
## $ FMEL_UPD_D                                <date> 2016-05-11, 2016-05-11, ...
## $ X_ADDR                                    <dbl> 28976.88, 38582.67, 28789...
## $ Y_ADDR                                    <dbl> 40229.12, 34032.10, 37450...
## $ SHAPE_Leng                                <dbl> 17494.24, 21872.80, 13517...
## $ SHAPE_Area                                <dbl> 13941380, 21733188, 76189...
## $ Total                                     <dbl> 100.9, 150.1, 49.4, NA, 7...
## $ `Public Bus Only`                         <dbl> 16.6, 32.5, 6.2, NA, 10.8...
## $ `MRT Only`                                <dbl> 11.9, 11.9, 6.5, NA, 10.8...
## $ `MRT & Public Bus Only`                   <dbl> 26.7, 34.8, 14.0, NA, 18....
## $ `Other Combinations of MRT or Public Bus` <dbl> 4.4, 6.0, 1.8, NA, 3.7, 2...
## $ `Taxi Only`                               <dbl> 1.1, 3.0, 0.6, NA, 0.5, 1...
## $ `Car Only`                                <dbl> 20.7, 36.9, 14.3, NA, 17....
## $ `Private Chartered Bus/Van Only`          <dbl> 2.4, 3.1, 0.7, NA, 2.5, 0...
## $ `Lorry/Pickup Only`                       <dbl> 2.3, 2.0, 0.7, NA, 1.8, 1...
## $ `Motorcycle/Scooter Only`                 <dbl> 3.0, 5.2, 0.8, NA, 2.5, 2...
## $ Others                                    <dbl> 1.4, 2.1, 0.2, NA, 0.8, 0...
## $ `No Transport Required`                   <dbl> 10.3, 12.6, 3.6, NA, 6.1,...
## $ perc_public                               <dbl> 59.1, 56.8, 57.7, NA, 57....
## $ perc_private                              <dbl> 24.6, 30.0, 31.8, NA, 27....
## $ perc_others                               <dbl> 16.3, 13.2, 10.5, NA, 14....
## $ geometry                                  <MULTIPOLYGON [m]> MULTIPOLYGON...

5.3.5 Data imputation

Some planning areas in the mode of transport data set only have combined values in a category called ‘Others’. Hence, for these planning areas, the values for the percentage of residents in these planning areas that travel by public or private transport to work will be retrieved and assigned from the category called ‘Others’.

# Retrieve percentage of public/private transport from the category 'Others'
others_public <- transport[[30, 14]]
others_private <- transport[[30, 15]]

# Assign values to the planning areas
transport_sg$perc_public[is.na(transport_sg$perc_public)] <- others_public
transport_sg$perc_private[is.na(transport_sg$perc_private)] <- others_private

Upon data sanity checks, the Western Islands, Southern Islands, North-Eastern Islands, Western Water Catchment and Central Water Catchment should not have any values for mode of transport, as there are no residents living there, so values for these areas are labelled as NA instead.

transport_sg$perc_public[which(transport_sg$PLN_AREA_N == "WESTERN ISLANDS")] <- NA
transport_sg$perc_public[which(transport_sg$PLN_AREA_N == "SOUTHERN ISLANDS")] <- NA
transport_sg$perc_public[which(transport_sg$PLN_AREA_N == "NORTH-EASTERN ISLANDS")] <- NA
transport_sg$perc_public[which(transport_sg$PLN_AREA_N == "CENTRAL WATER CATCHMENT")] <- NA
transport_sg$perc_public[which(transport_sg$PLN_AREA_N == "WESTERN WATER CATCHMENT")] <- NA

transport_sg$perc_private[which(transport_sg$PLN_AREA_N == "WESTERN ISLANDS")] <- NA
transport_sg$perc_private[which(transport_sg$PLN_AREA_N == "SOUTHERN ISLANDS")] <- NA
transport_sg$perc_private[which(transport_sg$PLN_AREA_N == "NORTH-EASTERN ISLANDS")] <- NA
transport_sg$perc_private[which(transport_sg$PLN_AREA_N == "CENTRAL WATER CATCHMENT")] <- NA
transport_sg$perc_private[which(transport_sg$PLN_AREA_N == "WESTERN WATER CATCHMENT")] <- NA

5.3.6 Data calculation II

The density of public transport in each planning area will also be visualised spatially, hence these values will have to be calculated. Public transport density in a given planning area, is defined as the total number of public transport available per square kilometre of the planning area.

\[Public\ transport\ density\ (number\ per\ square\ kilometre) = \frac{Number\ of\ bus\ stops\ + Number\ of\ train\ stations}{Area\ of\ planning\ area\ (km^2)}\]

Calculate

# Count number of bus stops in each planning area
transport_sg$num_bus_stops <- lengths(st_intersects(transport_sg, bus_stops))

# Count number of train stations in each planning area
transport_sg$num_train_stations <- lengths(st_intersects(transport_sg, train_stations))

# Calculate public transport density
transport_sg <- transport_sg %>%
  mutate(public_transport_density = (num_bus_stops + num_train_stations) / SHAPE_Area * 1000000)

View content

glimpse(transport_sg)

## Rows: 55
## Columns: 31
## $ OBJECTID                                  <int> 1, 2, 3, 4, 5, 6, 7, 8, 9...
## $ PLN_AREA_N                                <chr> "ANG MO KIO", "BEDOK", "B...
## $ PLN_AREA_C                                <chr> "AM", "BD", "BS", "BL", "...
## $ CA_IND                                    <chr> "N", "N", "N", "N", "N", ...
## $ REGION_N                                  <chr> "NORTH-EAST REGION", "EAS...
## $ REGION_C                                  <chr> "NER", "ER", "CR", "WR", ...
## $ INC_CRC                                   <chr> "E5CBDDE0C2113055", "1719...
## $ FMEL_UPD_D                                <date> 2016-05-11, 2016-05-11, ...
## $ X_ADDR                                    <dbl> 28976.88, 38582.67, 28789...
## $ Y_ADDR                                    <dbl> 40229.12, 34032.10, 37450...
## $ SHAPE_Leng                                <dbl> 17494.24, 21872.80, 13517...
## $ SHAPE_Area                                <dbl> 13941380, 21733188, 76189...
## $ Total                                     <dbl> 100.9, 150.1, 49.4, NA, 7...
## $ `Public Bus Only`                         <dbl> 16.6, 32.5, 6.2, NA, 10.8...
## $ `MRT Only`                                <dbl> 11.9, 11.9, 6.5, NA, 10.8...
## $ `MRT & Public Bus Only`                   <dbl> 26.7, 34.8, 14.0, NA, 18....
## $ `Other Combinations of MRT or Public Bus` <dbl> 4.4, 6.0, 1.8, NA, 3.7, 2...
## $ `Taxi Only`                               <dbl> 1.1, 3.0, 0.6, NA, 0.5, 1...
## $ `Car Only`                                <dbl> 20.7, 36.9, 14.3, NA, 17....
## $ `Private Chartered Bus/Van Only`          <dbl> 2.4, 3.1, 0.7, NA, 2.5, 0...
## $ `Lorry/Pickup Only`                       <dbl> 2.3, 2.0, 0.7, NA, 1.8, 1...
## $ `Motorcycle/Scooter Only`                 <dbl> 3.0, 5.2, 0.8, NA, 2.5, 2...
## $ Others                                    <dbl> 1.4, 2.1, 0.2, NA, 0.8, 0...
## $ `No Transport Required`                   <dbl> 10.3, 12.6, 3.6, NA, 6.1,...
## $ perc_public                               <dbl> 59.1, 56.8, 57.7, 41.6, 5...
## $ perc_private                              <dbl> 24.6, 30.0, 31.8, 35.5, 2...
## $ perc_others                               <dbl> 16.3, 13.2, 10.5, NA, 14....
## $ geometry                                  <MULTIPOLYGON [m]> MULTIPOLYGON...
## $ num_bus_stops                             <int> 168, 269, 95, 74, 160, 18...
## $ num_train_stations                        <int> 3, 6, 3, 0, 3, 6, 10, 6, ...
## $ public_transport_density                  <dbl> 12.265644, 12.653459, 12....

5.4 Create charts

5.4.1 Mode of transport composition

A ternary plot will be created to visualise the composition/ratio of residents’ usual modes of transport to work: public transport, private transport, or other modes of transport.

plot_ly() will be utilised to create the interactive ternary plot, with type = 'scatterternary'

ternary_data <- st_set_geometry(transport_sg, NULL)

axis <- function(txt) {
  list(
    title = txt, tickformat = "%", tickfont = list(size = 10)
  )
}

ternaryAxes <- list(
  aaxis = axis("Other Modes"), 
  baxis = axis("\nPrivate Transport"), 
  caxis = axis("\nPublic Transport")
)

# Create plotly ternary chart
transport_ternary<- plot_ly(ternary_data, 
                            a = ~perc_others, 
                            b = ~perc_private, 
                            c = ~perc_public,
                            color = ~REGION_N,
                            colors = 'Set1',
                            type = 'scatterternary',
                            mode = 'markers',
                            marker = list(symbol = 'circle',
                                          opacity = 0.6,
                                          size = 14,
                                          line=list(width = 0.5, color = '#666666')),
  
                            # Edit hover information
                            text = ~paste('<b>', PLN_AREA_N, '</b><br>',
                                          'Public Transport Travel: ', perc_public, '%<br>',
                                          'Private Transport Travel: ', perc_private, '%<br>',
                                          'Other Modes of Transport: ', perc_others, '%'),
                            hoverinfo = 'text') %>%
  layout(ternary = ternaryAxes,
         margin = list(t = 50, b = 50))

transport_ternary

5.4.2 Public transport across planning areas

A choropleth map will be created to visualise the percentage of residents utilising public transport as their main mode of transport across planning areas. Additionally, a proportional symbol map with graduated colours visualising public transport density will also be constructed and overlayed on top of the choropleth map.

tmap will be utilised to create the spatial thematic maps.
tmap_mode('view') is used to enable interactive maps in tmap.
tm_fill() and tm_borders() will be utilised to create the choropleth map.
tm_bubbles() will be utilised to create the proportional symbol map.

Enable interactive mode for tmap

tmap_mode('view')

## tmap mode set to interactive viewing

public_map <- tm_shape(transport_sg) +
  
  # Create choropleth map, coloured by percentage of population using private transportation
  tm_fill(col = 'perc_public',
          palette = 'Oranges',
          alpha = 0.8,
          # set mouse-over information
          id = 'PLN_AREA_N',
          # set layer name for control menu
          group = 'Percentage Travelling by Public Transport to Work',
          # format NA areas
          textNA = 'NA',
          colorNA = 'gray85',
          # create 5 classes
          n = 5,
          # classification method - 'pretty'
          style = 'pretty',
          # rename legend title
          title = '% Public Transport Travel',
          # format legend to show percentages
          legend.format=list(fun=function(x) paste0(formatC(x, digits=0, format="f"), "%")),
          # customise pop-up tooltip information
          popup.vars = c("% Travelling to Work by Public Transport :" = "perc_public"),
          popup.format = list(perc_public = list(fun=function(x) ifelse(is.na(x), 'NA', paste0(formatC(x, digits=0, format="f"), "%"))))) +
  tm_borders(lwd = 0.3, alpha = 0.5) +
  
  # Create proportional symbol map of public transportation density
  tm_bubbles(size = 'public_transport_density',
             col = 'public_transport_density',
             border.lwd = 0.1,
             palette = 'Greys',
             # set mouse-over information
             id = 'PLN_AREA_N',
             # set layer name for control menu
             group = 'Public Transport Density',
             # customise pop-up tooltip information
             popup.vars = c("Public Transport Density" = "public_transport_density",
                            "Number of Bus Stops" = "num_bus_stops",
                            "Number of Train Stations" = "num_train_stations"),
             popup.format = list(public_transport_density = 
                                   list(digits = 1)),
             # disable legend
             legend.col.show = FALSE) +
  
  # Customise layout and zoom settings
  tm_view(
    # Set zoom limits such that entire SG map is shown initially
    set.zoom.limits = c(10, 14),
    # Place legend at bottom right hand corner
    view.legend.position = c('right', 'bottom'),
    # Set bounds so that user will not accidentally pan away from Singapore map
    # set.bounds = TRUE
    )

public_map

5.4.3 Private transport across planning areas

A choropleth map will be created to visualise the percentage of residents utilising private transport as their main mode of transport across planning areas.

tm_fill() and tm_borders() will be utilised to create the choropleth map.

private_map <- tm_shape(transport_sg) +
  
  # Create choropleth map, coloured by percentage of population using private transportation
  tm_fill(col = 'perc_private',
          palette = 'Oranges',
          alpha = 0.8,
          # set mouse-over information
          id = 'PLN_AREA_N',
          # set layer name for control menu
          group = 'Percentage Travelling by Private Transport to Work',
          # format NA areas
          textNA = 'NA',
          colorNA = 'gray85',
          # create 5 classes
          n = 6,
          # classification method - 'pretty'
          style = 'pretty',
          # rename legend title
          title = '% Private Transport Travel',
          # format legend to show percentages
          legend.format=list(fun=function(x) paste0(formatC(x, digits=0, format="f"), "%")),
          # customise pop-up tooltip information
          popup.vars = c("% Travelling to Work by Private Transport :" = "perc_private"),
          popup.format = list(perc_private = list(fun=function(x) ifelse(is.na(x), 'NA', paste0(formatC(x, digits=0, format="f"), "%"))))) +
  tm_borders(lwd = 0.3, alpha = 0.5) +

  # Customise layout and zoom settings
  tm_view(
    # Set zoom limits such that entire SG map is shown initially
    set.zoom.limits = c(10, 14),
    # Place legend at bottom right hand corner
    view.legend.position = c('right', 'bottom'),
    # Set bounds so that user will not accidentally pan away from Singapore map
    # set.bounds = TRUE
    )

private_map

6 Final Visualisation

The final visualisation is created, by combining all the charts created in the previous section, following the planned layout of the sketch.

How Does Work Travel Mode Choice Differ Across Planning Areas in Singapore in 2015?

Composition of Work Travel Mode Choice

Main Mode of Transport to Work: Public vs. Private Transport

Note: A larger and darker grey circle indicates a higher density of public transport infrastructure (bus stops and train stations)

Source: Department of Statistics Singapore, General Household Survey 2015

7 Description & Insights of Viz

The interactive visualisation analyses the choice of work travel mode by residents in Singapore across planning areas.

1. Composition of Work Travel Mode Choice

This chart reveals the distribution of the relative proportions of work travel mode choice for residents of different planning areas, categorised by region.

Generally, public transport is the most popular travel mode, making up about 50% to 70% of work travel across the planning areas. There are only a few exceptions, like Tanglin and Bukit Timah (located in the Central region), which have a relatively higher proportion of residents (62.9% and 56% respectively) choosing private transport as their main mode of transport. Additionally, Outram (also in the Central region) has a relatively higher percentage of residents taking other modes of transport to work (30.8%).

2. Main Mode of Transport to Work

2a. Public Transport

This map enables the spatial comparison of the percentage of residents choosing public transport as their work travel mode, together with the availability of public transport infrastructure which is indicated by public transport density.

Several planning areas in the north, central and west regions have relatively higher percentage of residents who travel to work by public transport. However, not all planning areas have equal availability of public transportation infrastructure. Although areas like Jurong East and Yishun have very high percentage of public transport travel, public transport density is relatively low in these areas. This suggests that the capacity of public transport infrastructure in these planning areas may not be meeting the demands of its residents. Other planning areas with relatively low availability of public transport infrastructure include Paya Lebar, Tuas and Lim Chu Kang.

2b. Private Transport

This map enables spatial comparison of the percentage of residents choosing private transport as their work travel mode.

Bukit Timah and Tanglin have the highest proportion of residents travelling to work by private transport. Comparing with the map on the left, they also have relatively low public transport density and public transport travel, which could contribute to the higher usage of private transport. In contrast, Toa Payoh and Outram have the lowest proportion of residents travelling by private transport.

Data Visualisation on Mode of Transport to Work in Singapore

IS428 Visual Analytics and Applications | Assignment 5

Xiao Rong Wong

11/1/2020

1 Introduction

2 Data

3 Data and Design Challenges

3.1 Challenge 1: Extracting data

3.2 Challenge 2: Visualising main modes of transport spatially across planning areas

3.3 Challenge 3: Analysing availablility of public transport infrastructure in union with mode of transport to work

3.4 Challenge 4: Enabling comparison between public and private transportation as main mode of transport

3.5 Challenge 5: Visualising distribution of main modes of transport in Singapore

3.6 Challenge 6: Selecting the number of data classes

3.7 Challenge 7: Making interactive visualisations

4 Sketch of Proposed Design

5 Step-by-Step Data Visualisation

5.1 Install packages

5.2 Import data

5.2.1 Mode of transport to work

Import

View content

5.2.2 Planning area

Import

View content

5.2.3 Bus stops

Import

View content

5.2.4 Train stations

Import

View content

5.3 Data wrangling

5.3.1 Define projection

5.3.1.1 Planning area

Assign CRS

Check CRS

5.3.1.2 Bus stops

Assign CRS

Check CRS

5.3.1.3 Train stations

Assign CRS

Check CRS

5.3.2 Handle missing data

5.3.3 Data calculation I

Calculate

View content

5.3.4 Join data

Data join

View content

5.3.5 Data imputation

5.3.6 Data calculation II

Calculate

View content

5.4 Create charts

5.4.1 Mode of transport composition

5.4.2 Public transport across planning areas

5.4.3 Private transport across planning areas

6 Final Visualisation

7 Description & Insights of Viz