Women in the world No. 1

1. Introduction

This is an R Markdown Notebook focusing on women’s and gender inequality.

In this notebook, several development indicators, related to economic inequality by gender, life expectancy, fertility, and education, are used for mapping and graphing. The datasets used here come from the Oxford Martin Programme on Global Development as well as from the World Bank World Development Indicators website.

The first section shows times series of maps on economic inequality. It focuses on three single years and on the Americas region.

The second section shows two series of graphs: the first uses the World Bank income level groupings, and the second takes a selection of nine countries. This section reproduces the work by Sharon Howard, a researcher on digital history projecs at the University of Sheffield (UK). I have just added three American countries to the initial list selected by Sharon.

2. Series of maps on economic inequality

First of all, we need to go to Our World in Data and download a CSV file containing all data used in Economic inequality by gender. Then, we have to upload the file to our Rstudio Cloud Project. Then,we can proceed:

gender_gap <- read.csv("./gender-gap-in-wages.csv")
length(unique(gender_gap$Code))

## [1] 64

I selected three years for understanding if the economic inequality is improving or getting worse:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

npov92 <- filter(gender_gap, Year==1992)

npov05 <- filter(gender_gap, Year==2005)

npov14 <- filter(gender_gap, Year == 2014)

names(npov92) <-   c("Entity", "Code",   "Year",   "Pov_1992")  

names(npov05) <-   c("Entity", "Code",   "Year",   "Pov_2005")  

names(npov14) <-   c("Entity", "Code",   "Year",   "Pov_2014")

We need an spatial object representing countries borderlines. We will use the one provided by the package rnaturalearth:

library(rnaturalearth)
library(sf)

## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3

library(ggplot2)
library(ggiraph)
world <- sf::st_as_sf(rnaturalearth::countries110)

## str(world)
length(unique(world$iso_a3))

## [1] 175

Several attribute-based joins need to be done:

library(sf)

nworld1 <-  left_join(world, npov92, by = c('iso_a3' = 'Code')) 

nworld2 <-  left_join(nworld1, npov05, by = c('iso_a3' = 'Code'))  

nworld <-   left_join(nworld2, npov14, by = c('iso_a3' = 'Code'))  
                   
nworld %>%       st_transform(crs="+proj=laea +lon_0=18.984375")

We need to replace missing values with a zero value:

(nworld$Pov_1992[is.na(nworld$Pov_1992)] <- 0)

## [1] 0

(nworld$Pov_2005[is.na(nworld$Pov_2005)] <- 0)

## [1] 0

(nworld$Pov_2014[is.na(nworld$Pov_2014)] <- 0)

## [1] 0

world.centers <- st_centroid(nworld)

world.spdf <- methods::as(nworld, 'Spatial')
world.spdf@data$id <- row.names(world.spdf@data)

world.tidy <- broom::tidy(world.spdf)

Let’s review this indicator at American level:

world.tidy <- dplyr::left_join(world.tidy, world.spdf@data, by='id')
africa.tidy <- dplyr::filter(world.tidy, region_un=='Americas')

library(sf)
library(ggplot2)
library(hrbrthemes)

## NOTE: Either Arial Narrow or Roboto Condensed fonts are *required* to use these themes.

##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and

##       if Arial Narrow is not on your system, please see http://bit.ly/arialnarrow

library(colormap)
g92 <- ggplot(africa.tidy) +
  geom_polygon_interactive(
    color='black',
    aes(long, lat, group=group, fill=(Pov_1992),
        tooltip=sprintf("%s<br/>%s",iso_a3,Pov_1992))) +
 hrbrthemes::theme_ipsum() +
  colormap::scale_fill_colormap(
    colormap=colormap::colormaps$freesurface_red, reverse = T) +
  labs(title='Gender Gap in Wages in Americas in 1992', subtitle='as a percentage of average wages for men',
       caption='Source: World Bank Open Data.')

##widgetframe::frameWidget(ggiraph(code=print(g4)))
ggiraph(code=print(g92))

According to Max Roser, an Oxford scholar working on how the world is changing, “in most countries there is a substantial gender pay gap. Men tend to earn more than women in most countries. That is, there is a gender pay gap”.

“Differences in pay capture differences along many possible dimensions, including worker education, experience and occupation. When the pay gap is calculated by comparing all male workers to all female workers – irrespective of differences along these additional dimensions – the result is the ‘raw’ or ‘unadjusted’ pay gap” (Roser, 2018).

Note that these maps are interactive. You can get data for any country by clicking on it.

“The estimates shown in these maps correspond to differences between average hourly earnings of men and women (expressed as a percentage of average hourly earnings of men), and cover all workers irrespective of whether they work full time or part time” (Roser, 2018).

These maps suggest that: (i) in most countries the gap is positive; and (ii) there are large differences in the size of this gap across countries.

Please note that, in some cases, the gender gap values shown in these maps are null. Don’t be misled: such values are not right. These null values correspond to “non available data” (NA) (that is, that the statistical national agency does not provide data for a given year). Furthermore, these NA values were converted to 0 values for the software to work. In summary, these values are just “artefacts” (or collateral damage as Donald would say) of the mapping routine.

g02 <- ggplot(africa.tidy) +
  geom_polygon_interactive(
    color='black',
    aes(long, lat, group=group, fill=(Pov_2005),
        tooltip=sprintf("%s<br/>%s",iso_a3,Pov_2005))) +
 hrbrthemes::theme_ipsum() +
  colormap::scale_fill_colormap(
    colormap=colormap::colormaps$freesurface_red, reverse = T) +
  labs(title='Gender Gap in Wages in Americas in 2005', subtitle='as a percentage of average wages for men',
       caption='Source: World Bank Open Data.')

##widgetframe::frameWidget(ggiraph(code=print(g4)))
ggiraph(code=print(g02))

library(ggplot2)
g12 <- ggplot(africa.tidy) +
  geom_polygon_interactive(
    color='black',
    aes(long, lat, group=group, fill=(Pov_2014),
        tooltip=sprintf("%s<br/>%s",iso_a3,Pov_2014))) +
 hrbrthemes::theme_ipsum() +
  colormap::scale_fill_colormap(
    colormap=colormap::colormaps$freesurface_red, reverse = T) +
  labs(title='Gender Gap in Wages in Americas in 2014', subtitle='as a percentage of average wages for men',
       caption='Source: World Bank Open Data.')

##widgetframe::frameWidget(ggiraph(code=print(g4)))
ggiraph(code=print(g12))

Please note that I have not checked yet if the statement indicating that “in most countries the gender pay gap has decreased in the last couple of decades” is right or not. Neither, the reliability of these datasets.

3. Series of graphs

Before running the code below, it is necessary to download and save the following two WB datasets provided by Sara Howard:

1: population indicators from 1960 (ie, includes only countries that have data from 1960 onwards)
2: population and education indicators, all dates

library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)

wbspse <- read.csv("./wbspse_alldates_20180306.csv")
# convert wbig to long format, 1960-2009 only
wbspse_long_all_ind <- wbspse %>%
    select(IndicatorCode, IndicatorName, CountryName, CountryCode, X1960, X1961, X1962, X1963, X1964, X1965, X1966, X1967, X1968, X1969, X1970, X1971, X1972, X1973, X1974, X1975, X1976, X1977, X1978, X1979, X1980, X1981, X1982, X1983, X1984, X1985, X1986, X1987, X1988, X1989, X1990, X1991, X1992, X1993, X1994, X1995, X1996, X1997, X1998, X1999, X2000, X2001, X2002, X2003, X2004, X2005, X2006, X2007, X2008, X2009) %>%
    gather(year1, val, X1960, X1961, X1962, X1963, X1964, X1965, X1966, X1967, X1968, X1969, X1970, X1971, X1972, X1973, X1974, X1975, X1976, X1977, X1978, X1979, X1980, X1981, X1982, X1983, X1984, X1985, X1986, X1987, X1988, X1989, X1990, X1991, X1992, X1993, X1994, X1995, X1996, X1997, X1998, X1999, X2000, X2001, X2002, X2003, X2004, X2005, X2006, X2007, X2008, X2009) %>%
    mutate(year = as.integer(substr(year1, 2,5)), 
           decade = paste0(substr(year1, 2,4), '0s') )

head(wbspse)

wbsp1960 <- read.csv("./wbsp_1960_20180306.csv", header=TRUE)

# convert wbsp1960 to long format, 1960-2009 only
wbsp1960_long_all_ind <- wbsp1960 %>%
    select(IndicatorCode, IndicatorName, CountryName, CountryCode, X1960, X1961, X1962, X1963, X1964, X1965, X1966, X1967, X1968, X1969, X1970, X1971, X1972, X1973, X1974, X1975, X1976, X1977, X1978, X1979, X1980, X1981, X1982, X1983, X1984, X1985, X1986, X1987, X1988, X1989, X1990, X1991, X1992, X1993, X1994, X1995, X1996, X1997, X1998, X1999, X2000, X2001, X2002, X2003, X2004, X2005, X2006, X2007, X2008, X2009) %>%
    gather(year1, val, X1960, X1961, X1962, X1963, X1964, X1965, X1966, X1967, X1968, X1969, X1970, X1971, X1972, X1973, X1974, X1975, X1976, X1977, X1978, X1979, X1980, X1981, X1982, X1983, X1984, X1985, X1986, X1987, X1988, X1989, X1990, X1991, X1992, X1993, X1994, X1995, X1996, X1997, X1998, X1999, X2000, X2001, X2002, X2003, X2004, X2005, X2006, X2007, X2008, X2009) %>%
    mutate(year = as.integer(substr(year1, 2,5)), 
           decade = paste0(substr(year1, 2,4), '0s') )


head(wbsp1960)

palette <- c("#D55E00","#0072B2", "#56B4E9", "#999999", "#009E73", "#F0E442", "#8C4C46",  "#27A768", "#F5BC1A", "black")

filter_fertility <- "SP.DYN.TFRT.IN"
filter_fertility_teen <- "SP.ADO.TFRT"

filter_life <- "SP.DYN.LE00.(FE|MA).IN"

filter_primary <- "SE.PRM.ENRL.FE.ZS"
filter_secondary <- "SE.SEC.ENRL.FE.ZS"
filter_primary_teachers <- "SE.PRM.TCHR.FE.ZS"

filter_secondary_teachers <- "SE.SEC.TCHR.FE.ZS"
filter_literacy <- "SE.ADT.LITR.(FE|MA).ZS"
filter_literacy_youth <- "SE.ADT.1524.LT.FE.ZS|SE.ADT.1524.LT.MA.ZS"

filter_income <- "HIC|LIC|LMC|MIC|UMC|WLD" 

filter_six <- "TUN|PAK|NGA|JAM|NOR|FJI|COL|MEX|BRA"

filter_sixw <- "TUN|PAK|NGA|JAM|NOR|FJI|COL|MEX|BRA|WLD"

Life expectancy for men and women

According to Sharon, “life expectancy is one of the most long-running series in the data; most countries have it from 1960 onwards”. The following graphs compare female and male life expectancy at birth in the five income groups (and the world as a whole).

Sharon states that “the familiar observation that women live longer than men is not just a “Western” phenomenon, although it appears that the wealthier the country, the bigger the gap. She says that the level of the continuing gap between the richest and poorest countries is one thing that has not much changed.

ggplot(wbsp1960_long_all_ind %>%
  filter(grepl(filter_income, CountryCode), grepl(filter_life, IndicatorCode) ),
  aes(year, val, colour=IndicatorCode)
  ) + 
  geom_line() +
  facet_wrap(~CountryName) +
  scale_colour_manual(values=palette, labels=c("SP.DYN.LE00.FE.IN"="female", "SP.DYN.LE00.MA.IN"="male"), name="Gender") +
  labs(title="Life expectancy by country income groups, 1960-2009")

ggplot(wbsp1960_long_all_ind %>%
  filter(grepl(filter_six, CountryCode), grepl(filter_life, IndicatorCode) ),
  aes(year, val, colour=IndicatorCode)
  ) + 
  geom_line() +
  facet_wrap(~CountryName) +
  scale_colour_manual(values=palette, labels=c("SP.DYN.LE00.FE.IN"="female", "SP.DYN.LE00.MA.IN"="male"), name="Gender") +
  labs(title="Life expectancy in nine countries, 1960-2009")

ggplot(wbsp1960_long_all_ind %>%
      filter(grepl(filter_six, CountryCode), grepl(filter_life, IndicatorCode))
       , aes(year, val)) +
  geom_point(aes(colour = CountryName, shape=IndicatorCode) )  +
  labs(y="years", title="Life expectancy in nine countries") +
  scale_colour_manual(values=palette, name="Countries" ) +
  scale_shape_discrete(labels=c("SP.DYN.LE00.FE.IN"="female", "SP.DYN.LE00.MA.IN"="male"), name="Gender",solid = FALSE)

Fertility rate, total (births per woman)

ggplot(wbsp1960_long_all_ind %>%
      filter(grepl(filter_income, CountryCode), grepl(filter_fertility, IndicatorCode))
       , aes(year, val, colour = CountryName)) +
  geom_line() +
  scale_colour_manual(values=palette, name="Countries", breaks=c("High income", "Upper middle income", "Middle income", "Lower middle income", "Low & middle income", "Low income", "World")) +
  labs(y="births per woman", title="Fertility rates by country income groups")

Adolescent fertility rate (births per 1,000 women ages 15-19)

ggplot(wbsp1960_long_all_ind %>%
      filter(grepl(filter_six, CountryCode), grepl(filter_fertility, IndicatorCode)), aes(year, val, colour = CountryName)) +
  geom_line() +
  labs(y="births per woman", title="Fertility rates in nine countries") +
  scale_colour_manual(values=palette, name="Countries" )

Primary education, pupils (% female)

ggplot(wbspse_long_all_ind %>%
      filter(grepl(filter_income, CountryCode), grepl(filter_primary, IndicatorCode), year>1969)
       , aes(year, val, colour = CountryName)) +
  geom_line() +
  labs(y="% of pupils", title="Primary education of girls by country income groups") +
  scale_colour_manual(values=palette, name="Countries", breaks=c("High income", "Upper middle income", "Middle income", "Lower middle income", "Low & middle income", "Low income", "World") )

ggplot(wbspse_long_all_ind %>%
      filter(grepl(filter_sixw, CountryCode), grepl(filter_primary, IndicatorCode), year>1969)
       , aes(year, val, colour = CountryName)) +
  geom_line() +
  labs(y="% of pupils", title="Primary education of girls in nine countries") +
  scale_colour_manual(values=palette, name="Countries" )

## Warning: Removed 5 rows containing missing values (geom_path).

Secondary education, pupils (% female)

ggplot(wbspse_long_all_ind %>%
      filter(grepl(filter_income, CountryCode), grepl(filter_secondary, IndicatorCode), year>1969)
       , aes(year, val, colour = CountryName)) +
  geom_line() +
  labs(y="% of pupils", title="Secondary education of girls by country income groups") +
  scale_colour_manual(values=palette, name="Countries", breaks=c("High income", "Upper middle income", "Middle income", "Lower middle income", "Low & middle income", "Low income", "World") )

ggplot(wbspse_long_all_ind %>%
      filter(grepl(filter_sixw, CountryCode), grepl(filter_secondary, IndicatorCode), year>1969)
       , aes(year, val, colour = CountryName)) +
  geom_line() +
  labs(y="% of pupils", title="Secondary education of girls in nine countries") +
  scale_colour_manual(values=palette, name="Countries")

## Warning: Removed 7 rows containing missing values (geom_path).

ggplot(wbspse_long_all_ind %>%
      filter(grepl(filter_income, CountryCode), grepl(filter_primary_teachers, IndicatorCode), year>1969)
       , aes(year, val, colour = CountryName)) +
  geom_line() +
  labs(y="% of teachers", title="Female primary school teachers") +
  scale_colour_manual(values=palette, name="Countries", breaks=c("High income", "Upper middle income", "Middle income", "Lower middle income", "Low & middle income", "Low income", "World") )

## Warning: Removed 3 rows containing missing values (geom_path).

ggplot(wbspse_long_all_ind %>%
      filter(grepl(filter_sixw, CountryCode), grepl(filter_primary_teachers, IndicatorCode), year>1969)
       , aes(year, val, colour = CountryName)) +
  geom_line() +
  labs(y="% of teachers", title="Female primary school teachers in nine countries") +
  scale_colour_manual(values=palette, name="Countries" )

## Warning: Removed 71 rows containing missing values (geom_path).

Please note that this is a proof-of-concept for teaching purposes. This means the notebook is still a work in progress.

Bis bald!!!