Here are the key claims in the two papers:
STOET AND GEARY (2018) CLAIM: At the individual level, boys tend to be relatively better STEM than girls. This generally leads boys to be more likely to pursue STEM jobs. But, by expectancy value theory, when there’s less economic opportunity, girls are more likely to go against their relative strengths and pursue STEM degrees (because STEM jobs are high paying). As a result, in countries with less economic opporunity there are more girls pursuing STEM jobs, and thus more STEM equality.
self_efficacy_diff (“The sex difference in self efficacy (boys – girls)”)FALK AND HERMLE (2018) CLAIM: Boys and girls have different preferences. When there’s more economic opportunity (GDP), people are more free to express their preferences, leading to greater divergence in gender preferences in countries lots of economic opportunity.
genderdif (composite score of “six fundamental preferences with regard to social and nonsocial domains: willingness to take risks; patience, which captures preferences over the intertemporal timing of rewards; altruism; trust (24); and positive and negative reciprocity, which capture the costly willingness to reward kind actions or to punish unkind actions, respectively.”)# get GDP 2017 data from World Bank API
gdp_data <- wbstats::wb(indicator = "NY.GDP.PCAP.CD",
startdate = 2017,
enddate = 2017) %>%
select(iso2c, value) %>%
rename(gdp_2017 = value)
pref_data <- readstata13::read.dta13("genderdifferences.dta") %>%
mutate(country_code = countrycode::countrycode(ison, "iso3n", "iso2c")) %>%
select(country_code, genderdif)
STOET_PATH <- "/Users/mollylewis/Documents/research/Projects/1_in_progress/IATLANG/exploratory_studies/7_age_controls/stoet_data.csv"
stoet_data <- read_csv(STOET_PATH) %>%
mutate(country_code = countrycode::countrycode(country_name, "country.name", "iso2c")) %>%
select(country_code, everything())
# data from Bill von Hippel
INPATH <- "data/Molly data2.csv"
country_raw <- read_csv(INPATH) %>%
janitor::clean_names() %>%
left_join(gdp_data, by = c("country_code" = "iso2c")) %>%
left_join(stoet_data) %>%
left_join(pref_data)
# save country data with GDP 2017 data merged in (unscaled)
# OUTPATH <- "country_level_data_with_GDP.csv"
# write_csv(country_raw, OUTPATH)
# scale variables
country_level <- country_raw %>%
mutate_if(is.numeric, base::scale)
plot_data <- country_level %>%
select_if(is.numeric) %>%
select(-n_participants, -gdp_2013)
long_corr <- cor(plot_data,
use = "pairwise.complete.obs") %>%
as.data.frame() %>%
rownames_to_column("v2") %>%
gather("v1", "estimate", -v2)
long_p <- corrplot::cor.mtest(plot_data,
use = "pairwise.complete.obs")$p %>%
as.data.frame(row.names = names(plot_data)) %>%
do(setNames(.,names(plot_data))) %>%
rownames_to_column("v2") %>%
gather("v1", "p", -v2)
corr_df <- full_join(long_corr, long_p) %>%
mutate(estimate_char = case_when(v1 == v2 ~ "",
TRUE ~ as.character(round(estimate,2))),
estimate = case_when(v1 == v2 ~ as.numeric(NA),
TRUE ~ estimate),
estimate_color = case_when(p < .05 ~ estimate, TRUE ~ 0),
v1 = fct_relevel(v1, "lang_es_sub", "lang_es_wiki", "subt_occu_semantics_fm",
"wiki_occu_semantics_fm", "mean_prop_distinct_occs", "implicit_resid", "explicit_resid", "median_country_age", "gdp_2017", "per_women_stem", "gender_inequality_index_value", "science_literacy_diff", "intra_indv_diff", "self_efficacy_diff", "intrest_diff", "enjoy_diff", "satisfaction"),
v2 = fct_relevel(v2, "lang_es_sub", "lang_es_wiki", "subt_occu_semantics_fm",
"wiki_occu_semantics_fm", "mean_prop_distinct_occs", "implicit_resid", "explicit_resid", "median_country_age", "gdp_2017", "per_women_stem", "gender_inequality_index_value", "science_literacy_diff", "intra_indv_diff", "self_efficacy_diff", "intrest_diff", "enjoy_diff","satisfaction"))
ggplot(corr_df, aes(v1, fct_rev(v2), fill = estimate_color)) +
geom_tile() + #rectangles for each correlation
#add actual correlation value in the rectangle
geom_text(aes(label = estimate_char), size = 3) +
scale_fill_gradient2(low ="blue", mid = "white", high = "red",
midpoint = 0, space = "Lab", guide = "colourbar",
name = "Pearson's r") +
ggtitle("Pairwise Correlation Coefficients") +
theme_classic(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1), #, hjust = .95, vjust = .2),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
legend.position = "none")
Pairwise correlation between all country-level measures. Red and blue correspond to positive and negative correlations, respectively. Non-significant correlations ( >= .05) are indicated with white squares.
Note that this is with their measure (per_women_stem); the mediation isn’t siginficant with our women stem measure…not clear why, other than that the data is newer.
country_level %>%
ggplot(aes(x = per_women_stem, y = self_efficacy_diff, label = country))+
geom_point() +
geom_text_repel(size = 3) +
ylab("Gender difference in STEM Self Efficacy (Stoet & Geary, 2018)") +
xlab("Per. Women in STEM (SG measure)") +
ggtitle("STEM measure vs. Gender Dif. in STEM Self Efficacy ") +
geom_smooth(method = "lm", alpha = .2) +
theme_classic()
country_level %>%
ggplot(aes(x = lang_es_sub, y = self_efficacy_diff, label = country))+
geom_point() +
geom_text_repel(size = 3) +
ylab("Gender difference in STEM Self Efficacy (Stoet & Geary, 2018)") +
xlab("Linguistic Gender Bias\n(effect size)") +
ggtitle("Language Bias vs. Gender Dif. in STEM Self Efficacy ") +
geom_smooth(method = "lm", alpha = .2) +
theme_classic()
lm(per_women_stem ~ self_efficacy_diff + lang_es_sub, data = country_level) %>%
summary()
##
## Call:
## lm(formula = per_women_stem ~ self_efficacy_diff + lang_es_sub,
## data = country_level)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2426 -0.6839 -0.1395 0.5759 1.5049
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.008111 0.161698 0.050 0.9604
## self_efficacy_diff -0.514323 0.191907 -2.680 0.0134 *
## lang_es_sub -0.130940 0.212744 -0.615 0.5443
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8215 on 23 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.4027, Adjusted R-squared: 0.3508
## F-statistic: 7.755 on 2 and 23 DF, p-value: 0.002666
psych::mediate(x = "lang_es_sub", y = "per_women_stem", m = "self_efficacy_diff",
data = country_level, plot = T) %>%
summary()
## Call: psych::mediate(y = "per_women_stem", x = "lang_es_sub", m = "self_efficacy_diff",
## data = country_level, plot = T)
##
## Total effect estimates (c)
## per_women_stem se t df Prob
## lang_es_sub -0.47 0.15 -3.2 36 0.00287
##
## Direct effect estimates (c')
## per_women_stem se t df Prob
## lang_es_sub -0.09 0.16 -0.57 36 0.572000
## self_efficacy_diff -0.61 0.16 -3.93 36 0.000367
##
## R = 0.67 R2 = 0.45 F = 14.85 on 2 and 36 DF p-value: 1.98e-05
##
## 'a' effect estimates
## self_efficacy_diff se t df Prob
## lang_es_sub 0.61 0.13 4.72 37 3.34e-05
##
## 'b' effect estimates
## per_women_stem se t df Prob
## self_efficacy_diff -0.61 0.16 -3.93 36 0.000367
##
## 'ab' effect estimates
## per_women_stem boot sd lower upper
## lang_es_sub -0.38 -0.38 0.22 -0.81 -0.07
country_level %>%
fit_mediation(
x = "lang_es_sub",
y = "per_women_stem",
m = "self_efficacy_diff") %>%
test_mediation() %>%
p_value()
## [1] 0.0074
This also holds for the wikipedia model. Also holds for three other meaasures of gender inequality (hdi, gini, ggi)
country_level %>%
ggplot(aes(x = gdp_2017, y = genderdif, label = country))+
ylab("Gender Differences in Preferences (Falk & Hermle, 2018)") +
xlab("GDP") +
ggtitle("GDP vs. Gender Differences in Preferences ") +
geom_smooth(method = "lm", alpha = .2) +
geom_point() +
geom_text_repel(size = 3) +
theme_classic(base_size = 12)
country_level %>%
ggplot(aes(x = lang_es_sub, y = genderdif, label = country))+
ylab("Gender Differences in Preferences (Falk & Hermle, 2018)") +
xlab("Linguistic Gender Bias\n(effect size)") +
ggtitle("Language Bias vs. Dif. in Gender Preferences ") +
geom_smooth(method = "lm", alpha = .2) +
geom_point() +
geom_text_repel(size = 3) +
theme_classic(base_size = 12)
lm(genderdif ~ gdp_2017 + lang_es_sub, data = country_level) %>%
summary()
##
## Call:
## lm(formula = genderdif ~ gdp_2017 + lang_es_sub, data = country_level)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.18022 -0.59530 -0.03532 0.47677 1.06880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1418 0.1414 1.003 0.326756
## gdp_2017 0.7827 0.1772 4.417 0.000218 ***
## lang_es_sub 0.1859 0.1814 1.025 0.316548
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7036 on 22 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.5903, Adjusted R-squared: 0.553
## F-statistic: 15.85 on 2 and 22 DF, p-value: 5.462e-05
lm(gdp_2017 ~ genderdif + lang_es_sub, data = country_level) %>%
summary()
##
## Call:
## lm(formula = gdp_2017 ~ genderdif + lang_es_sub, data = country_level)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.84907 -0.44668 -0.03616 0.25844 1.39812
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.06978 0.12576 -0.555 0.584576
## genderdif 0.60052 0.13595 4.417 0.000218 ***
## lang_es_sub 0.14158 0.15979 0.886 0.385167
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6163 on 22 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.5855, Adjusted R-squared: 0.5478
## F-statistic: 15.54 on 2 and 22 DF, p-value: 6.204e-05
Their causal model is GDP -> preferences. A mediation model suggests that GDP does mediate between language and preferences:
psych::mediate(x = "lang_es_sub", y = "genderdif", m = "gdp_2017",
data = country_level, plot = T) %>%
summary()
## Call: psych::mediate(y = "genderdif", x = "lang_es_sub", m = "gdp_2017",
## data = country_level, plot = T)
##
## Total effect estimates (c)
## genderdif se t df Prob
## lang_es_sub 0.45 0.15 3.06 36 0.00422
##
## Direct effect estimates (c')
## genderdif se t df Prob
## lang_es_sub 0.07 0.14 0.53 36 5.99e-01
## gdp_2017 0.68 0.14 4.85 36 2.35e-05
##
## R = 0.72 R2 = 0.52 F = 19.29 on 2 and 36 DF p-value: 2.03e-06
##
## 'a' effect estimates
## gdp_2017 se t df Prob
## lang_es_sub 0.56 0.14 4.06 37 0.000245
##
## 'b' effect estimates
## genderdif se t df Prob
## gdp_2017 0.68 0.14 4.85 36 2.35e-05
##
## 'ab' effect estimates
## genderdif boot sd lower upper
## lang_es_sub 0.37 0.4 0.16 0.13 0.76
But a more plausible explanation, which there is also evidence is: language -> preferences -> GDP
psych::mediate(x = "lang_es_sub", y = "gdp_2017", m = "genderdif",
data = country_level, plot = T) %>%
summary()
## Call: psych::mediate(y = "gdp_2017", x = "lang_es_sub", m = "genderdif",
## data = country_level, plot = T)
##
## Total effect estimates (c)
## gdp_2017 se t df Prob
## lang_es_sub 0.56 0.14 4.06 36 0.000254
##
## Direct effect estimates (c')
## gdp_2017 se t df Prob
## lang_es_sub 0.29 0.12 2.42 36 2.05e-02
## genderdif 0.59 0.12 4.85 36 2.35e-05
##
## R = 0.76 R2 = 0.58 F = 25.03 on 2 and 36 DF p-value: 1.54e-07
##
## 'a' effect estimates
## genderdif se t df Prob
## lang_es_sub 0.45 0.15 3.06 37 0.00416
##
## 'b' effect estimates
## gdp_2017 se t df Prob
## genderdif 0.59 0.12 4.85 36 2.35e-05
##
## 'ab' effect estimates
## gdp_2017 boot sd lower upper
## lang_es_sub 0.26 0.27 0.11 0.11 0.51
country_level %>%
fit_mediation(
x = "lang_es_sub",
y = "per_women_stem",
m = "genderdif") %>%
test_mediation() %>%
p_value()
## [1] 0.0177
This also holds for the wikipedia model. Also holds for ggi, hdi_value and per_women_stem.