Plotting Gender Pay Gap using Plotly

In this assignment, we will explore the gender pay gap data in 2017-2019 published by the UK government.

Basic Scatterplot

A basic scatterplot is easy to make, with the added benefit of tooltips that appear when your mouse hovers over each point. Specify a scatterplot by indicating type = "scatter". Notice that the arguments for the x and y variables as specified as formulas, with the tilde operator (~) preceding the variable that you’re plotting.

library(plotly)
plot_ly(gpg17, x = ~DiffMeanHourlyPercent, y = ~MaleTopQuartile, type = "scatter", alpha = 0.5)

It shows more companies in the UK have positive gender pay gap (i.e. +ve differnce in mean hourly pay).

Scatterplot Color

We will look at the financial industry in particular.

plot_ly(gpg17[gpg17$SICGroup!="Non-Financial",], x = ~DiffMeanHourlyPercent, y = ~MaleTopQuartile, type = "scatter", color = ~factor(SICGroup), alpha = 0.5)

Most financial companies have +ve gender pay gap where the majority of top earning quartile are male.

Scatterplot Sizing

We will specify the size argument according to the Employer size.

plot_ly(gpg17[gpg17$SICGroup!="Non-Financial",], x = ~DiffMeanHourlyPercent, y = ~MaleTopQuartile, type = "scatter", color = ~factor(SICGroup), size = ~WeightedEmployerSize)

Line Graph

Let’s look at the gender pay gap over the time for Barclays Bank UK PLC.

plot_ly(gpg[gpg$CompanyNumber=="09740322",], x = ~factor(Year), y = ~DiffMeanHourlyPercent, type = "scatter", mode = "lines")

Multi Line Graph

We will look at the gender pay gap over time between the financial industries and non-financial industries.

library(plotly)
library(tidyr)
library(dplyr)

newgpg <- gpg %>% 
    group_by(Year, SICGroup) %>% 
    summarise(MeanGenderGap = mean(DiffMeanHourlyPercent))
newgpg <- as.data.frame(newgpg)

plot_ly(newgpg, x = ~Year, y = ~MeanGenderGap, color = ~SICGroup, type = "scatter", mode = "lines")

Histogram

A histogram is great for showing how counts of data are distributed. Use the type = "histogram" argument. Here it shows the distribution of gender pay gap among the companies population.

plot_ly(x = ~gpg17$DiffMeanHourlyPercent, type = "histogram")

–>

–> –> –>

–> –> –> –>

–>

–> –> –> –>

–>

–> –>

–> –> –> –>

–>

–> –> –> –>

More Resources