1. Overview

A short description of not more than 350 words.

Description: The aim of this visualisation is to analyse Singaporeans’ commuting and transport habits. Aside from car usage, Singapore’s well-established public transport system offers residents many options for commuting, such as Mass Rapid Transport (MRT) train services and public buses.

This data is taken from Singapore’s General Household Survey 2015, and comprises two tables. The two tables are about Mode of Transport to work and Travelling Time to work. Hence, analysis of the data aims to yield insights about what modes of transport Singaporean residents use to commute to work, and how long their commuting journey to work is.

Insights from visualisation: The first insight from this visualisation are that the residents from planning areas of Jurong, Yishun, Hougang, Bedok and Tampines have the highest MRT & Public Bus Only, as well as Car Only, usage rates. The second insight is that Woodlands residents have the highest rates of long Travel Time in Singapore, with 21 thousand residents declaring a travel time of more than 60 minutes.

2. Final Data Visualisation

2.1 Mode of Transport

This interactive map allows you to visualise the rate of public transport usage across planning areas in Singapore. The colour intensity on the map represents the level of MRT & Public Bus Only usage, and the data is segmented by quantiles. Hover over each coloured planning area to view the name of the planning area. Click on a planning area to view more details about MRT & Public Bus Only usage.

Number of Residents who use MRT & Public Bus Only (thousands)

As we can see, areas such as Jurong, Yishun, Hougang, Bedok and Tampines are amongst the planning areas with the highest rates of MRT & Public Bus only usage, falling in the highest quantile range of 33.6-41.4 thousand residents indicating this option.

Below is another interactive map that shows the number of “Car Only” transport users in Singapore. The colour intensity on the map represents the level of Car Only usage, and the data is segmented by quantiles. Hover over each coloured planning area to view the name of the planning area. Click on a planning area to view more details about Car Only usage.

Number of Residents who use Car Only (thousands)

When compared to the previous map on MRT & Public Bus Only usage, the map on Car Only usage yields some surprising results. Contrary to expectations of public transport acting as a substitute for car usage, regions such as Jurong, Hougang, Paya Lebar, Bedok and Tampines appear to have a high Car Only usage and MRT & Public Bus Only usage. This may be because these areas may have a relatively higher population density.

In this map, these aforementioned regions fall in the highest quantile of 21.36 - 36.90 thousand residents declaring Car Only usage.

2.2 Travel Times

This visualisation looks at how many residents across Singapore have a travel time of more than 60 minutes, which for the purposes of this analysis will be defined as a long travel time. Hover over each coloured planning area to view the name of the planning area. Click on a planning area to view more details about the number of residents which have a long travel time.

Number of Residents with Long Travel Times (More Than 60 Mins)

Units in thousands

We can see that Woodlands has the highest number of residents with a long travelling time, as represented by the red colouring that corresponds to the highest quantile of 20 - 25 thousand.

The bar chart below is based on travel times for all Singapore residents, and is colour-coded according to the length of travel times. Hover over each bar for more details on the number of residents for each Travel Time category (please note that the Residents figures are in thousands).

Travel Times for All Residents

Units in thousands

Amongst the Travel Time categories, 16 - 30 Mins appears to be the most frequent travel timing for Singapore residents.

3. Preparation

3.1 Data processing

  1. Open R Studio, and select File and then New Project. Name the new project “VA Assignment 5”. Once the project is open, select File again, and then select New File, and select R Markdown from the drop-down menu. Name the R Markdown file “Assignment 5”.

  2. Rename the .csv file from “OutputFile.csv” to “Transport.csv”. Save it in the same working directory that the R Markdown file is saved in.

  3. Open the .csv file in Excel after downloading it from the SingStat website. Remove unnecessary headings and notes from above and below the data.

  4. Since both tables have a “Total” column, rename the first “Total” column (in column B of the .csv file) to “Mode Total”, and rename the second “Total” column (in column N of the .csv file) to “Time Total”. This helps to distinguish the two totals from each of the tables.

  5. Fill in the first cell (cell A1) with “Planning Area”. Save the .csv file.

  6. Go to data.gov.sg and download the MP14 file in shp format. Save it to the working directory. Unzip the folder and make sure the unzipped folder is saved to the same working directory. Delete the original zipped folder.

3.2 Making the visualisation using ggplot

library(tidyverse)
library(dplyr)
library(ggplot2)
library(plotly)
library(data.table)
library(sf)
library(tmap)
library(data.table)

Capitalise the planning area names.

transport_data <- read_csv("Transport.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   `Planning Area` = col_character(),
##   `Mode Total` = col_number(),
##   `Public Bus Only` = col_double(),
##   `MRT Only` = col_double(),
##   `MRT & Public Bus Only` = col_double(),
##   `Other Combinations Of MRT Or Public Bus` = col_double(),
##   `Taxi Only` = col_double(),
##   `Car Only` = col_double(),
##   `Private Chartered Bus/Van Only` = col_double(),
##   `Lorry/Pickup Only` = col_double(),
##   `Motorcycle/ Scooter Only` = col_double(),
##   Others = col_double(),
##   `No Transport Required` = col_double(),
##   `Time Total` = col_number(),
##   `Up To 15 Mins` = col_double(),
##   `16 - 30 Mins` = col_double(),
##   `31 - 45 Mins` = col_double(),
##   `46 - 60 Mins` = col_double(),
##   `More Than 60 Mins` = col_double()
## )
transport_data$"Planning Area"= toupper(transport_data$`Planning Area`)
mpsz <- st_read(dsn = "C:/Users/Lynnette/Documents/Class Content/Current folder (change once uploaded to hard disk)/SMU Y3S2/Visual Analytics/Assignment 5/VA Assignment 5/master-plan-2014-subzone-boundary-web-shp", 
                layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\Lynnette\Documents\Class Content\Current folder (change once uploaded to hard disk)\SMU Y3S2\Visual Analytics\Assignment 5\VA Assignment 5\master-plan-2014-subzone-boundary-web-shp' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
mpsz
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
## First 10 features:
##    OBJECTID SUBZONE_NO       SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
## 1         1          1    MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
## 2         2          1    PEARL'S HILL    OTSZ01      Y          OUTRAM
## 3         3          3       BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
## 4         4          8  HENDERSON HILL    BMSZ08      N     BUKIT MERAH
## 5         5          3         REDHILL    BMSZ03      N     BUKIT MERAH
## 6         6          7  ALEXANDRA HILL    BMSZ07      N     BUKIT MERAH
## 7         7          9   BUKIT HO SWEE    BMSZ09      N     BUKIT MERAH
## 8         8          2     CLARKE QUAY    SRSZ02      Y SINGAPORE RIVER
## 9         9         13 PASIR PANJANG 1    QTSZ13      N      QUEENSTOWN
## 10       10          7       QUEENSWAY    QTSZ07      N      QUEENSTOWN
##    PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D   X_ADDR
## 1          MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05 31595.84
## 2          OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05 28679.06
## 3          SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05 29654.96
## 4          BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05 26782.83
## 5          BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05 26201.96
## 6          BM CENTRAL REGION       CR 9D286521EF5E3B59 2014-12-05 25358.82
## 7          BM CENTRAL REGION       CR 7839A8577144EFE2 2014-12-05 27680.06
## 8          SR CENTRAL REGION       CR 48661DC0FBA09F7A 2014-12-05 29253.21
## 9          QT CENTRAL REGION       CR 1F721290C421BFAB 2014-12-05 22077.34
## 10         QT CENTRAL REGION       CR 3580D2AFFBEE914C 2014-12-05 24168.31
##      Y_ADDR SHAPE_Leng SHAPE_Area                       geometry
## 1  29220.19   5267.381  1630379.3 MULTIPOLYGON (((31495.56 30...
## 2  29782.05   3506.107   559816.2 MULTIPOLYGON (((29092.28 30...
## 3  29974.66   1740.926   160807.5 MULTIPOLYGON (((29932.33 29...
## 4  29933.77   3313.625   595428.9 MULTIPOLYGON (((27131.28 30...
## 5  30005.70   2825.594   387429.4 MULTIPOLYGON (((26451.03 30...
## 6  29991.38   4428.913  1030378.8 MULTIPOLYGON (((25899.7 297...
## 7  30230.86   3275.312   551732.0 MULTIPOLYGON (((27746.95 30...
## 8  30222.86   2208.619   290184.7 MULTIPOLYGON (((29351.26 29...
## 9  29893.78   6571.323  1084792.3 MULTIPOLYGON (((20996.49 30...
## 10 30104.18   3454.239   631644.3 MULTIPOLYGON (((24472.11 29...
transport_data1 <- left_join(mpsz, transport_data, 
                              by = c("PLN_AREA_N" = "Planning Area"))

Create an interactive map that visualises public transport use in Singapore. Use the variable “MRT & Public Bus Only” with tmap elements tm_shape(), tm_polygons(), tm_fill() and tm_borders. Set the colour palette to “Blues” in tm_polygons(). Users can hover over each coloured area to see which planning area it is in the hover text.

Number of Residents who use MRT & Public Bus Only (thousands)

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(transport_data1)+ tm_polygons("MRT & Public Bus Only",id="PLN_AREA_N", palette="Blues", style="quantile", border.alpha = 0.5) # + tm_layout(title="Number of Residents who use MRT & Public Bus Only (thousands)", title.position = "center")

Create a “Car Only” map using the same elements as in the previous map. Change the tm_polygons() and tm_fill() col attribute to “Car Only”. Change the colour palette to “OrRd” for greater contrast with the previous map.

Number of Residents who use Car Only (thousands)

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(transport_data1)+ tm_polygons("Car Only",id="PLN_AREA_N",palette="OrRd", style = "quantile", borders.alpha=0.5) # +tm_layout(title="Number of Residents who use Car Only (thousands)")

Concentration of long travel times (More than 60 mins) plotted on a map

Concentration of Residents with Long Travel Times (More Than 60 Mins)

Units in thousands

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(transport_data1) + tm_polygons("More Than 60 Mins",id="PLN_AREA_N",palette="OrRd")+ tm_fill(style = "quantile") + tm_borders(alpha = 0.5) # + tm_layout(title= 'Concentration of Long Travel Times (More Than 60 Mins)',  title.position = c('right', 'top'))
## Warning: One tm layer group has duplicated layer types, which are omitted. To
## draw multiple layers of the same type, use multiple layer groups (i.e. specify
## tm_shape prior to each of them).

Bar chart of Travel times for selected estate (Jurong West). First, isolate out the Jurong West data that is needed, and transpose the data. Then, use setDT() to convert the x-axis information into the first column. Omit the last row which contains geographic coordinate information. Plot out the graph using ggplot and ggplotly. Color-code the bar chart by length of travel time.

Travel Times for Jurong West Residents

Units in thousands

total_data <- as.data.frame(t(transport_data[1,15:19]))
new1 <- setDT(as.data.frame(total_data), keep.rownames = "Time Taken")
Time_Taken <- new1$"Time Taken"
Residents <- new1$"V1"
positions <- c("Up To 15 Mins", "16 - 30 Mins", "31 - 45 Mins","46 - 60 Mins","More Than 60 Mins")
ggplotly(ggplot(new1, aes(x=Time_Taken, y=Residents,fill=Time_Taken)) + geom_bar(stat="identity")+labs(y="Number of residents (thousands)",x="Time Taken")+scale_x_discrete(limits = positions) +  scale_fill_manual(values=c("#FACFC6","#F4AA9B","#F67055","#F54C28","#FFF0DC"))+ theme(legend.position = "none"), tooltip = c("Residents")) 

4. Major Data and Design Challenges

4.1 Cleaning and organising data

Since I used a dataset that was made out of two data tables, there were some data columns that had duplicate names, which made it difficult to use the data for analysis. Therefore, I changed some of the names for the columns to make it clear what each specific column was referring to. I also had to splice and reformat some of the data, especially for the last graph which was a bar plot, because the data was not in a format that was suitable for bar plotting.

4.2 Designing challenges

The titles of the graphs did not appear where they should have after knitting, when I used tm_layout() to specify the titles (tm_layout() is commented out in Section 3 for reference). Hence, I had to use headers in place of titles for the maps specifically, and I also added it for the bar plot for the sake of standardisation even though the bar plot’s title attribute was working properly during knitting.

In order to set up the comparison for Modes of Transport in Section 2, I had to differentiate the two Modes of Transport graphs. I chose to do so by choosing the “Blues” colour palette for the first map, and the “OrRd” colour palette for the second map, in order to create visual contrast and prepare the viewer for comparisons between the two.

For the Travel Times map and graphs, I decided to give them a similar colour theme because they are analysing similar aspects of the topic of Travel Times.

4.3 Proposed design

5. Appendix

Introduction to dataset: The dataset comprises two tables from Singapore’s General Household Survey 2015. The two tables are about mode of transport to work and Travelling Time to work, and are called “Table 146 Resident Working Persons Aged 15 Years and Over by Planning Area and Usual Mode of Transport to Work” and “Table 147 Resident Working Persons Aged 15 Years and Over by Planning Area and Travelling Time to Work”. The data is taken from SingStat. The numerical data is displayed in thousands.