1. Summary

A short description of not more than 350 words.
The aim of this visualisation is to analyse labour force data in Singapore to garner insights on monthly income. As a small nation-state, one of Singapore’s most precious resources are its workers, and it is therefore worth examining the income that Singaporean workers receive.

The data used to create this analysis is from SingStat, a repository of data maintained by the Singapore Department of Statistics. The data used in the analysis is from 2009 to 2019. This dataset contains median monthly income values for various occupations in Singapore, and also breaks down income by gender.

Analysis of the data reveals some interesting insights, such as the fact that the only occupation in which female median monthly income was higher than male median monthly income was in the Clerical Support Workers occupation. Another interesting insight derived from this analysis is that the highest earning occupation overall is the Managers & Administrators occupation.

2. Analysis of Median Monthly Income in Singapore (Final Data Visualisation)

2.1 Monthly Income over time

Over time, we can see how salaries have changed. The figure below shows the average of both genders’ median monthly income from 2009 to 2019, for the Managers & Administrators occupation.

2.2 Monthly Income for each Gender

Let’s take a look at median monthly income for both genders in the most recent year, 2019.

The most highly paid occupation is Managers & Administrators, as seen from the graph above. This is true for both females (represented in \(\color{pink}{\text{pink}}\)) and males (represented in \(\color{lightblue}{\text{light blue}}\)).

When we compare monthly income for each gender, we see that males earn a higher median monthly income than females for almost every occupation except the Clerical Support Workers occupation, but even then, the gap between males’ and females’ median monthly income is small here.

Using a stacked bar chart to compare both genders’ income, we confirm that the highest-earning occupation is indeed Managers & Administrators, regardless of the gender pay gap.

2.3 Monthly Income for Females

Here, we look at the median monthly income for females. The color scale represents the level of females’ income for each occupation as compared to other occupations, beginning with dark red for the highest value and then getting lighter as median monthly income for each occupation decreases.

2.4 Monthly Income for Males

Median monthly income for males is shown in this bar chart. The color scale represents the level of males’ income for each occupation as compared to other occupations, beginning with dark blue for the highest value and then getting lighter as median monthly income for each occupation decreases. Note that the Median Monthly Income for males, represented along the horizontal X-axis, is of a different scale from that of the Monthly Income for Females graph previously shown.

3. Creating the visualisation

3.1 Collecting the data

Data was taken from SingStat, by selecting two tables and selecting an appropriate time period. The data was downloaded in .csv format. The original dataset name was “Median Gross Monthly Income From Work (Including Employer CPF) Of Full-Time Employed Residents By Occupations And Sex,(June), Annual”, and the extracted dataset contained data from 2009 to 2019.

3.2 Cleaning the data

  1. Rename the .csv file, replacing the original name “OutputFile.csv” with “Monthly Income.csv”. This allows for easier reading of the data files into R.

  2. Open the Monthly Income.csv file in Microsoft Excel. Remove unnecessary spacings, headers and footnoted information. Create two new columns to the right of the Variables column (the first column). Name these columns “Occupation” and “Gender” respectively. In the Occupation column, type the formula “=LEFT(A2,FIND(”-“,A2)-2)” in cell B3. Pull down the formula to fill the rest of the cells in column B. In the Gender column, type the formula “=REPLACE(A2,1,FIND(”-“,A2)+1,”“)”, and fill down for the rest of the cells in column C. Copy columns B and C and paste them as values in two new blank columns. Then delete the old columns A, B and C, and save the data.

3.3 Create R Markdown project and file

  1. Click on R Studio to launch it.
  2. Go to “File”, then go to “New Project”. Select “New Directory”, then “New Project”.
  3. Name the directory “Assignment 4”. Select an appropriate location to store the working directory in by using the “Browse” button.
  4. Select “File” again, and select “New File”. Select “R Markdown”, then name the file “Assignment 4”. Make sure the default output format is set to HTML.
  5. An R Markdown file will be opened in R Studio. Select “File” and select “Save As…”. Name the file “Assignment 4 Monthly Income”.

3.4 Making the visualisation in R Markdown

First, data is read into the variable labour_data. The libraries tidyverse, dplyr, RColorBrewer and ggplot2 are called.

library(tidyverse)
library(dplyr)
library(RColorBrewer)
library(ggplot2)
income_data <- read_csv("Monthly Income.csv")

income_2019 <- income_data$"2019"
Occupation <- income_data$Occupation
Gender <- income_data$Gender

The package ggplot2 is then called, so that we can use the ggplot function. With ggplot, we plot our first graph, Average Monthly Income over time for both genders (for the Managers & Administrators occupation). A new variable, income_data_time, is created to hold all the data except for the Gender and Occupation columns, so that we can create a time series plot. The data table is manipulated and transposed such that a variable, new, holds the average of the median monthly income in each year, for both genders. ggplot is then used to plot a line graph that also has scatter plot points, for greater clarity and ease of viewing.

income_data_time <- income_data %>%
  select(!(c(Gender,Occupation)))

income_data_time1 = (income_data_time[1,]+income_data_time[2,])/2
transposed_time_data <- t(income_data_time1)

library(data.table)
new <- setDT(as.data.frame(transposed_time_data), keep.rownames = "Year")

ggplot(data=new,aes(x=Year,y=V1))+geom_line(aes(group=1))+geom_point()+labs(title ="Averaged median Monthly Income over time, across both genders (Managers & Administrators occupation)", y="Median Monthly Income") + theme_light() + scale_y_continuous(labels=scales::dollar_format())

Next, the Monthly Income for each Occupation graph is created. ggplot is used to plot median monthly income for each occupation using a dodged bar chart, so that viewers can have an overview of the median monthly income for each occupation and gender. theme_classic() was used for easier and clearer viewing.

ggplot(data=income_data, aes(y=reorder(Occupation,income_2019), x=income_2019, fill=Gender))+ geom_bar(position="dodge",stat="identity") + labs(title="Monthly Income for each Occupation in 2019", y="Occupation",x="Median Monthly Income") + scale_fill_manual(values=c("Pink","lightblue")) +theme_classic() + scale_x_continuous(labels=scales::dollar_format())

A stacked bar chart was also created to emphasise the difference in median monthly income for each occupation.

ggplot(data=income_data, aes(y=reorder(Occupation,income_2019), x=income_2019, fill=Gender)) + geom_bar(stat="identity") + scale_fill_manual(values=c("Pink","lightblue")) + labs(title="Monthly Income in each Occupation, for both genders (2019)",y="Occupation",x="Median Monthly Income") + geom_text(aes(label= paste("$",income_2019), group=Gender), position = position_stack(vjust =0.5)) + scale_x_continuous(labels=scales::dollar_format()) 

To find pay of females across occupations, a new variable, income_data_female, was created with only Female rows retained. The values of the most recent year, 2019, were extracted and then used to plot a bar chart. RColorBrewer palette Reds was used to further highlight the median monthly income for each occupation for females.

income_data_female <- income_data %>%
  filter(Gender=="Female")

income_2019_female <- income_data_female$"2019" 

ggplot(data=income_data_female, aes(y=reorder(Occupation,income_2019_female), x=income_2019_female,fill=as.factor(income_2019_female)))   + geom_col() + labs(title="Female Monthly Income for each Occupation, in 2019", x="Median Monthly Income", y="Occupation") + guides(fill=guide_legend(title="Median Monthly Income")) + scale_fill_brewer(palette="Reds") + scale_x_continuous(labels=scales::dollar_format()) 

To find pay of males across occupations, a new variable, income_data_male, was created with only Male rows retained. The values of the most recent year, 2019, were extracted and then used to plot a bar chart. RColorBrewer palette Blues was used to further highlight the median monthly income for each occupation for females.

income_data_male <- income_data %>%
  filter(Gender=="Male") 

income_2019_male <- income_data_male$"2019" 

ggplot(data=income_data_male, aes(y=reorder(Occupation,income_2019_male), x=income_2019_male,fill=as.factor(income_2019_male)))   + geom_col() + labs(title="Male Monthly Income for each Occupation, in 2019", x="Monthly Income", y="Occupation") + guides(fill=guide_legend(title="Median Monthly Income")) + scale_fill_brewer(palette="Blues") + scale_x_continuous(labels=scales::dollar_format())

4. Major Data and Design Challenges

4.1. Collecting data

It was difficult to find appropriate datasets that corresponded to what I had in mind, because some datasets were too simplistic in that they had too few variables, and would not allow me to draw a complete picture of the situation. In the end, I used SingStat’s table builder to combine two relevant datasets about Singapore’s labour force.

4.2 Cleaning and organising data

Some of the data had variables that were not separated. For example, the Variables column initially contained variables such as “Managers & Administrators - Male” within one cell, which meant that it would be hard to do analysis that involved segmenting the data by gender. Therefore, I had to devise a way to organise the data such that it could be used for more in-depth analysis. Data cleaning was done using Excel, and further data processing was done in R Markdown.

4.3 Designing challenges

The data needed to be manipulated for many of the graphs because the dimensions of the data did not work for all the graphs that I wanted to create. Hence, I had to create many additional data variables in R that held the modified data that I needed for my graphs. I decided to create multiple graphs as well, so that I could explore the topic of median monthly income in Singapore more thoroughly.

4.4 Proposed design

5. Appendix

Notes on data (from source)

  • Monthly Income data:
    • Full title: M920131 - Median Gross Monthly Income From Work (Including Employer CPF) Of Full-Time Employed Residents By Occupations And Sex,(June), Annual
    • Data last updated: 10/11/2020
    • Generated by: SingStat Table Builder
    • Date generated: 22/03/2021
    • Data are from Comprehensive Labour Force Survey. Data exclude Full-Time National Servicemen. Residents refer to Singapore Citizens and Permanent Residents. Full-Time employment (excluding full-time National Servicemen) refers to employment where normal hours of work is at least 35 hours per week. Gross Monthly Income From Work refers to income earned from employment. For employees, it refers to the gross monthly wages or salaries before deduction of employee CPF contribution and personal income tax. It comprises basic wages, overtime pay, commissions, tips, other allowance and one-twelfth of annual bonuses. For self-employed persons, gross monthly income refers to the average monthly profits from their business, trade or profession (i.e. total receipts less business expenses incurred) before deduction of income tax. Data are classified based on Singapore Standard Occupation Classification(SSOC) 2015. Data before year 2015 which were coded based on earlier versions of the SSOC were mapped to SSOC 2015 as far as possible to facilitate data comparability. As data are captured from a sample survey, year-on-year income changes are prone to fluctuations and hence should always be interpreted with caution. Income growth studied over longer periods (e.g. 5 to 10 years) smooths out these fluctuations and hence provides a more direct indication of income growth.