Student Name: Jovin Synott
School: Department of Mathematics, School of Science, RMIT, Australia
To determine whether birth rates can predict female workplace participation. Countries with low birth rates should correspond to an increased capacity in the female population to participate in the workplace. However, there may exceptions to the rule when countries provide additional support to working mothers to stay in the workforce or on the flip side, religion can result in some countries having a significantly lower labour force female participation.
#Import female participation in labour force data
library(readxl)
female_part_wide <-
read_excel("API_SL.TLF.TOTL.FE.ZS_DS2_en_excel_v2_1681168.xls",
range = "A4:BM268")
female_part_wide <- female_part_wide[-c(1,3,4)]
#Create a long version of dataset
female_part_long <- female_part_wide %>%
pivot_longer(
'1960':'2020',
names_to = "Year",
values_to = "labor_force_fem_part",
values_drop_na = TRUE
)
#Import birth rates data
birth_rates_wide <-
read_excel("API_SP.DYN.TFRT.IN_DS2_en_excel_v2_1679135.xls",
range = "A4:Bm268")
birth_rates_wide <- birth_rates_wide [-c(1,3,4)]
#Create a long version of dataset
birth_rates_long <- birth_rates_wide %>%
pivot_longer(
'1960':'2020',
names_to = "Year",
values_to = "birth_rate",
values_drop_na = TRUE
)
#Merge female participation and birth rates dataset
merge1 <- merge(female_part_long,birth_rates_long,
by=c("Country Code","Year"))
#Import region and income bracket dataset
metadata <- read_excel("API_SL.TLF.TOTL.FE.ZS_DS2_en_excel_v2_1681168.xls",
sheet = "Metadata - Countries")
#Merge with region and income bracket dataset
merge2 <- merge(merge1,metadata,
by=c("Country Code"))
#Import population data
population_wide <-
read_excel("API_SP.POP.TOTL_DS2_en_excel_v2_1678631.xls",
range = "a4:bm268")
population_wide <- population_wide [-c(1,3,4)]
#Create a long version of dataset
population_long <- population_wide %>%
pivot_longer(
'1960':'2020',
names_to = "Year",
values_to = "Population",
values_drop_na = TRUE
)
#Merge with population dataset
merge3 <- merge(merge2,population_long,
by=c("Country Code","Year"))
#Create exploratory visualisation of birth rates and labour force female participation
Observe the annual world aggregate birth rates and labour force female participation, independently.
#Create a timeseries chart for the world
time_series_chart_data <-
gather(merge2, Variable,Value, labor_force_fem_part:birth_rate)
time_series_chart_data$Variable <-
factor(time_series_chart_data$Variable,
labels = c("Birth rate","Labour force \n female participation(%)"))
time_series_chart_data <- time_series_chart_data[time_series_chart_data$TableName == 'World',]
time_series_chart_data$Year <- as.numeric(time_series_chart_data$Year)
time_series_chart <-ggplot(data = time_series_chart_data, aes(x = Year, y = Value))
time_series_chart <- time_series_chart + geom_line() + facet_grid(Variable ~ ., scales = "free", labeller = label_value) +
labs(title = "Birth rates and labour force female participation", subtitle = "Worldwide since 1990",caption = "Labor force female participation: Ages 15 and older who supply labour for production \n Birth rate: Number of children born per woman", y = "")+
theme_minimal()
time_series_chart
Observe the birth rates and labour force female participation, by country region and population, for the most recent year’s data available (2018)
#Filter on latest year available and remove data points relating to aggregated countries
comparison_2018 <- merge3 %>% filter(Year== "2018" & !is.na(Region))
#Create scatter plot showing country birth rate as a function of female participation in labour, year = 2018
p1 <- ggplot(comparison_2018, aes(x = birth_rate, y = labor_force_fem_part))
#Add region and population
p2 <- p1 + geom_point(aes(colour = Region,size=Population)) +
labs(title = "Birth rates and labour force female participation", subtitle = "By region and population in 2018",caption = "Labour force female participation: Ages 15 and older who supply labour for production \n Birth rate: Number of children born per woman",x="Birth rate", y="Labour force female participation (%)") +
scale_color_discrete(name = "Region")+
theme_minimal()
p2
Observing labour force female participation has outlier countries predominantly in the Middle East and South Asia regions. Observe distribution of labour force female participation, by country, for the most recent year’s data available (2018)
#Create boxplot of labour force female participation
box1 <- ggplot(comparison_2018,aes(y=labor_force_fem_part)) + geom_boxplot(width = .25,outlier.colour="red")+
labs(title = "Labour force female participation", subtitle = "Distribution in 2018",caption = "Labour force female participation: Ages 15 and older who supply labour for production",y="Labour force female participation(%)")+ theme_minimal()
box1
Observing that there are countries with extremely low labour force female participation rates. Should consider removing countries with a rate lower than 30%.
Similarly for birth rates, observing significantly higher birth rates in Sub-Saharan African countries. Observe distribution of birth rate, by country, for the most recent year’s data available (2018)
#Create boxplot of labour force female participation
box2 <- ggplot(comparison_2018,aes(y=birth_rate)) + geom_boxplot(width = .25,outlier.colour="red")+
labs(title = "Birth rates", subtitle = "Distribution in 2018",caption = "Birth rate: Number of children born per woman",y="Birth rate")+ theme_minimal()
box2
Observing that there is one country with an extremely high birth rate. Should consider removing countries with a rate higher than six.
#Remove extreme outliers from dataset
#Identify extreme outliers in countries birth rate or labour force female participation
box_labor_force_fem_part_rate <- ggplot_build(box1)
min_force_fem_part_rate <- box_labor_force_fem_part_rate$data[[1]][1]
max_force_fem_part_rate <- box_labor_force_fem_part_rate$data[[1]][5]
lower_force_fem_part_rate <- box_labor_force_fem_part_rate$data[[1]][2]
upper_force_fem_part_rate <- box_labor_force_fem_part_rate$data[[1]][4]
median_force_fem_part_rate <- box_labor_force_fem_part_rate$data[[1]][3]
box_birth_rate <- ggplot_build(box2)
min_birth_rate <- box_birth_rate$data[[1]][1]
max_birth_rate <- box_birth_rate$data[[1]][5]
lower_birth_rate <- box_birth_rate$data[[1]][2]
upper_birth_rate <- box_birth_rate$data[[1]][4]
median_birth_rate <- box_birth_rate$data[[1]][3]
#Removing extreme outlier data points
comparison_2018_filtered <-
filter(comparison_2018,
birth_rate > min_birth_rate$ymin,
birth_rate < max_birth_rate$ymax,
labor_force_fem_part < max_force_fem_part_rate$ymax,
labor_force_fem_part > min_force_fem_part_rate$ymin)
Observing the relationship of birth rates and labour force female participation by region and population alone does not easily support the development of a prediction model. ‘So it is helpful to consider other factors that make employment compatible with childbearing, and thus broaden the choices available to women.’(Ortiz-Ospina et al. 2017, Childcare and other family-oriented policies)
One of the practical ways women retain their employment is through the support of their employers and in many cases, support from the government in the form of family benefits. Having this support means that women feel they do not have to sacrifice their career to have family, by ensuring there is financial support.
Family benefits per GDP data can be downloaded from https://data.oecd.org/
#Import data on family spend as proportion of GDP from OECD
family_benefit_spend <- read_csv("DP_LIVE_23112020013157212.csv",
col_types = cols(INDICATOR = col_skip(),
SUBJECT = col_skip(), MEASURE = col_skip(),
FREQUENCY = col_skip(), TIME = col_character()))
#Drop rows corresponding to years before the latest available, 2015
family_benefit_spend <- family_benefit_spend[(family_benefit_spend$TIME == '2015'),]
#Drop TIME and Flag codes columns
family_benefit_spend <- family_benefit_spend [-c(2,4)]
#Rename Value column to Fam_Benefit_Per_GDP
family_benefit_spend <- family_benefit_spend %>%
rename(Fam_Benefit_Per_GDP = Value)
#Grab OECD average of family spend as proportion of GDP
OECD_avg <- family_benefit_spend[family_benefit_spend$LOCATION == 'OECD','Fam_Benefit_Per_GDP']
#Drop rows corresponding to OECD average
family_benefit_spend <- family_benefit_spend[!(family_benefit_spend$LOCATION == 'OECD'),]
#Merge family spend as proportion of GDP data
comparison_2018_filtered <- merge(x=comparison_2018_filtered,y=family_benefit_spend,
by.x='Country Code',by.y='LOCATION',all.x=TRUE)
Observe the birth rates and labour force female participation, with proportion of GDP that is offered in family benefits, for the most recent year’s data available (2018) and with extreme outliers already removed.
Data only available from countries who are members of the the Organisation for Economic Co-operation and Development (OECD)
p3 <- ggplot(comparison_2018_filtered,
aes(x = birth_rate, y = labor_force_fem_part,colour=Fam_Benefit_Per_GDP))+
scale_color_gradient(low="blue", high="red")+
geom_point()+
labs(title = "Birth rates, labour force female participation (2018) and Family Benefits Per GDP", subtitle = "Outlier countries removed",caption = "Labour force female participation: Ages 15 and older who supply labour for production \n Birth rate: Number of children born per woman",x="Birth rate", y="Labour force female participation (%)",colour = "Family benefits per GDP (%) \n (2015)")+
theme_minimal()+
geom_vline(xintercept = upper_birth_rate$upper ,
color = "blue")+
geom_hline(yintercept=lower_force_fem_part_rate$lower, color = "blue")
p3
Observing that the OECD member countries are predominantly countries with interquartile range labour force female participation and birth rates.
#Create another filter on upper left of previous viz to see the effect of OECD countries v non-OECD countries
comparison_2018_filtered_upper_left <-
filter(comparison_2018_filtered,
birth_rate < upper_birth_rate$upper,
labor_force_fem_part > lower_force_fem_part_rate$lower)
#Create flag for OECD countries
comparison_2018_filtered_upper_left$OECD_Status <-
ifelse(is.na(comparison_2018_filtered_upper_left$Fam_Benefit_Per_GDP), "Non-OECD", "OECD")
p4 <- ggplot(comparison_2018_filtered_upper_left,
aes(x = birth_rate, y = labor_force_fem_part,
colour=OECD_Status, label=TableName))+
geom_point()+
geom_smooth( method ="lm")+
geom_text(aes(label=ifelse(birth_rate>3 &
OECD_Status == "OECD",TableName,'')),hjust=0,vjust=0)+
labs(title = "Birth rates and labour force female participation (2018) by OECD member status", subtitle = "Linear regression model included",caption = "Labour force female participation: Ages 15 and older who supply labour for production \n Birth rate: Number of children born per woman\n Outliers countries removed",x="Birth rate", y="Labour force female participation (%)",colour = "OECD Member Status")+
theme_minimal()
p4
Observing the linear regression line for OECD countries has a higher labour force female participation than Non-OECD, even when the birth rates increase beyond three. The OECD member country Israel, appears as interesting data point, with a relatively high birth rate and labour force female participation.
The 95% confidence intervals increase as birth rates increase. Prompting the need to explore other factors that might explain the data points in high birth rates and high labour force female participation, predominantly in Sub-Saharan Africa and not OECD member countries.
OECD countries offer family benefits to their population which does help maintain women in the workplace. Therefore, there are other characteristics which might have the opposite effect. For example, does a country with a high percentage of women in parliament correspond to having fiscal policies in place to support working mothers. Further, when fiscal policies don’t exist does it reflect low availability of funds to draw upon or low “GDP per capita”. And, are the lasting effects high informal/vulnerable employment.
#Import GDP per capita data
GDP_per_capita_wide <-
read_excel("API_NY.GDP.PCAP.CD_DS2_en_excel_v2_1678737.xls",
range = "A4:Bm268")
GDP_per_capita_wide <- GDP_per_capita_wide [-c(1,3,4)]
#Create a long version of dataset
GDP_per_capita_long <- GDP_per_capita_wide %>%
pivot_longer(
'1960':'2020',
names_to = "year",
values_to = "GDP_per_cap",
values_drop_na = TRUE
)
#Filter on GDP per capita in 2018
GDP_per_capita_2018 <- GDP_per_capita_long %>% filter(year == "2018")
#Remove redundant columns
GDP_per_capita_2018 <- GDP_per_capita_2018 [-c(2)]
#Merge GDP per capita data with comparison_2018_filtered
comparison_2018_filtered <-
merge(x=comparison_2018_filtered,
y=GDP_per_capita_2018,
by="Country Code",all.x=TRUE)
#Import women in vulnerable employment data
Fem_Vuln_Emp_wide <-
read_excel("API_SL.EMP.VULN.FE.ZS_DS2_en_excel_v2_1679692.xls",
range = "A4:Bm268")
Fem_Vuln_Emp_wide <- Fem_Vuln_Emp_wide [-c(1,3,4)]
#Create a long version of dataset
Fem_Vuln_Emp_long <- Fem_Vuln_Emp_wide %>%
pivot_longer(
'1960':'2020',
names_to = "year",
values_to = "Fem_Vuln_Emp",
values_drop_na = TRUE
)
#Filter on female vulnerable employment data in 2018
Fem_Vuln_Emp_long_2018 <- Fem_Vuln_Emp_long %>% filter(year == "2018")
Fem_Vuln_Emp_long_2018 <- Fem_Vuln_Emp_long_2018 [-c(2)]
#Merge female vulnerable employment data with comparison_2018_filtered
comparison_2018_filtered <-
merge(x=comparison_2018_filtered,
y=Fem_Vuln_Emp_long_2018,
by="Country Code",all.x=TRUE)
#Import women in parliament data
women_parl_wide <-
read_excel("API_SG.GEN.PARL.ZS_DS2_en_excel_v2_1681304.xls",
range = "A4:Bm268")
women_parl_wide <- women_parl_wide [-c(1,3,4)]
#Create a long version of dataset
women_parl_long <- women_parl_wide %>%
pivot_longer(
'1960':'2020',
names_to = "year",
values_to = "Women_Parl",
values_drop_na = TRUE
)
#Filter on women in parliament data in 2018
women_parl_long_2018 <- women_parl_long %>% filter(year == "2018")
women_parl_long_2018 <- women_parl_long_2018 [-c(2)]
#Merge merge women in parliament data with comparison_2018_filtered
comparison_2018_filtered <-
merge(x=comparison_2018_filtered,
y=women_parl_long_2018,
by="Country Code",all.x=TRUE)
#Create scatter matrix to explore relationships between all factors
p5 <- ggpairs(comparison_2018_filtered,
columns = c(4, 11, 12,13,3),columnLabels = c("Birth rate", "GDP per \n cap($US)", "Female\nVuln\nEmp(%)","Women\nparl(%)","Lab Force\nFemale(%)"))+
labs(title = "Various measures impacting birth rates/labour force female participation (2018)", subtitle = "Scatter matrix showing correlation",caption = "Lab Force Female(%): Ages 15 and older who supply labour for production \n Birth rate: Number of children born per woman \n GDP per cap($US): Gross domestic product (GDP) in US dollars divided by midyear population\n Female Vuln Emp(%): Contributing family workers as a percentage of total employment \n Women parl(%): Parliamentary seats held by women\nOutlier countries removed")+
theme_minimal()
p5
Observing a strong linear correlation between female vulnerable employment and birth rates (r=0.836). Also, observing a logarithmic relationship between GDP per capita and birth rates/female vulnerable.
#Calculate the median female vulnerable employment
median_vuln_fem_emp <-
median(comparison_2018_filtered$Fem_Vuln_Emp,na.rm = TRUE)
#Create flag for countries above/below median female vulnerable employment
comparison_2018_filtered$Fem_Vuln_Emp_Status <-
ifelse(comparison_2018_filtered$Fem_Vuln_Emp<median_vuln_fem_emp, "Below median", "Above median")
#Create scatter plot showing country birth rate, labour force female participation and female vulnerable employment, year = 2018
p7 <- ggplot(comparison_2018_filtered,
aes(x = birth_rate, y = labor_force_fem_part,
colour=Fem_Vuln_Emp_Status,label=TableName))+
geom_point()+
geom_smooth(method ="lm")+
geom_text(aes(label=ifelse(birth_rate>4
& labor_force_fem_part <35,
TableName,'')),hjust=0,vjust=0)+
labs(title = "Birth rates and labour force female participation (2018)", subtitle = "Linear regression model on vulnerable employment status included",caption = "Labour force female participation: Ages 15 and older who supply labour for production \n Birth rate: Number of children born per woman \n Female Vuln Emp(%): Contributing family workers as a percentage of total employment\nOutlier countries removed",x="Birth rate", y="Labour force female participation (%)")+
theme_minimal()+
labs(colour = "Female Vuln Emp(%)")
p7
Observing that labour force increases in line with birth rates for countries with above median female vulnerable employment levels. Contrasting, labour force female participation decreases as birth rates increase for countries with below median female vulnerable employment levels. Mauritania and Sudan are interesting data points with above median female vulnerable employment, high birth rates but low labour force female participation.
The final analysis is to allocate each country to an independent cohort in relation to birth rate, labour force female participation, OECD member status and female vulnerable employment, to determine whether a particular cohort has an observed volume which has a significant variance from the expected, using a Chis-square test
#Create column for row headers in crosstab
comparison_2018_filtered$birth_rate_vuln_emp_cat <-
ifelse(comparison_2018_filtered$birth_rate < median_birth_rate$middle,
ifelse(comparison_2018_filtered$Fem_Vuln_Emp_Status == "Below median","2 Low BR/\nLow FVE", "1 Low BR/\nHigh FVE"),
ifelse(comparison_2018_filtered$Fem_Vuln_Emp_Status == "Below median","3 High BR/\nLow FVE", "4 High BR/\nHigh FVE"))
#Create column for column headers in crosstab
comparison_2018_filtered$lab_force_fem_OECD_cat <-
ifelse(comparison_2018_filtered$labor_force_fem_part < median_force_fem_part_rate$middle,
ifelse(is.na(comparison_2018_filtered$Fam_Benefit_Per_GDP),"4 Low LFFP/\nNon-OECD", "3 Low LFFP/\nOECD"),
ifelse(is.na(comparison_2018_filtered$Fam_Benefit_Per_GDP),"2 High LFFP/\nNon-OECD", "1 High LFFP/\nOECD"))
#Create labels for mosaic plot
crosstab1 <- table(comparison_2018_filtered$lab_force_fem_OECD_cat, comparison_2018_filtered$birth_rate_vuln_emp_cat,
dnn = c("Labor force fem part","Birth rate"))
labs<-round(prop.table(crosstab1,2),1)
vcd::mosaic(crosstab1, shade=TRUE, pop = FALSE,
labeling= labeling_border(rot_labels = c(90,0,0,0), just_labels = c("left", "right","right", "right"),offset_labels = c(0,0,0,0) ,gp_labels = gpar(fontsize = 8),labels_varnames = c(FALSE,FALSE,FALSE,FALSE),set_varnames = c("Labor force fem part" = "","Birth rate"="")),
spacing = vcd::spacing_conditional(sp = unit(1.4, "lines"), start = unit(2, "lines"), rate = 0.4),
main = "Probability of labour force female participation|\nbirth rate & vulnerable employment",sub = "BR:Birth rate, LFFP: Labour force female participation,\n FVE: Female Vulnerable Employment, OECD: OECD member country, Non-OECD: Non OECD member country\nLow%: Below median%, High%: Above median%\nOutlier countries removed",main_gp = gpar(fontsize = 15),sub_gp = gpar(fontsize = 9),title_margins=unit(4,"lines"),margins=unit(4,"lines"),
keep_aspect_ratio=TRUE)
labeling_cells(text = labs, margin=0)(crosstab1)
Observing that countries that have low birth rates, low female vulnerable employment, high labour force female participation and OECD member status; are over-represented in the data. Countries that have low birth rates, low female vulnerable employment, high labour force female participation and non OECD-member status; are under-represented in the data. Analysis demonstrates that countries with low birth rates and low female vulnerable employment are more likely OECD member countries with family benefit schemes offered by their governments. And, more importantly there is a negligible count of OECD member countries with high female vulnerable employment.
It is possible to predict with a reasonable amount of confidence labour force female participation through modelling birth rates and socio-economic factors.
Countries that can evidence the provision of family support to working mother’s can retain their employment and increase their number of children. However, the number of data points is relatively low as family benefits per GDP data is only available for OECD member countries, increasing the 95% confidence interval. These are also countries that traditionally have low birth rates, with the exception of Israel.
Countries that have high birth rates strongly correlates with high vulnerable employment. Indicating that working mothers in countries, predominantly in Sub-Saharan Africa have high birth rates - possibly for cultural reasons - or possibly as an insurance policy against reduced productivity in families.