1. Overview

Aim:

Introduction to dataset: The dataset comprises two tables from Singapore’s General Household Survey 2015. The two tables are about mode of transport to work and Travelling Time to work, and are called “Table 146 Resident Working Persons Aged 15 Years and Over by Planning Area and Usual Mode of Transport to Work” and “Table 147 Resident Working Persons Aged 15 Years and Over by Planning Area and Travelling Time to Work”. The data is taken from SingStat.

2. Preparation

2.1 Data processing

  1. Open R Studio, and select File and then New Project. Name the new project “VA Assignment 5”. Once the project is open, select File again, and then select New File, and select R Markdown from the drop-down menu. Name the R Markdown file “Assignment 5”.

  2. Rename the .csv file from “OutputFile.csv” to “Transport.csv”. Save it in the same working directory that the R Markdown file is saved in.

  3. Open the .csv file in Excel after downloading it from the SingStat website. Remove unnecessary headings and notes from above and below the data.

  4. Since both tables have a “Total” column, rename the first “Total” column (in column B of the .csv file) to “Mode Total”, and rename the second “Total” column (in column N of the .csv file) to “Time Total”. This helps to distinguish the two totals from each of the tables.

  5. Fill in the first cell (cell A1) with “Planning Area”. Save the .csv file.

2.2 Making the visualisation using ggplot

(all units in thousands)

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)
transport_data <- read_csv("Transport.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   `Planning Area` = col_character(),
##   `Mode Total` = col_number(),
##   `Public Bus Only` = col_double(),
##   `MRT Only` = col_double(),
##   `MRT & Public Bus Only` = col_double(),
##   `Other Combinations Of MRT Or Public Bus` = col_double(),
##   `Taxi Only` = col_double(),
##   `Car Only` = col_double(),
##   `Private Chartered Bus/Van Only` = col_double(),
##   `Lorry/Pickup Only` = col_double(),
##   `Motorcycle/ Scooter Only` = col_double(),
##   Others = col_double(),
##   `No Transport Required` = col_double(),
##   `Time Total` = col_number(),
##   `Up To 15 Mins` = col_double(),
##   `16 - 30 Mins` = col_double(),
##   `31 - 45 Mins` = col_double(),
##   `46 - 60 Mins` = col_double(),
##   `More Than 60 Mins` = col_double()
## )
View(transport_data)

3. Short Description

4. Final data visualisation