Data readme for Lewis & Lupyan (under review)

There are three units of analysis in the paper – participant, country, and language – and three corresponding dataframes. Below, I describe the variables in each. All of the scripts that were used to produce these dataframes can be found in the analyses/ folder in the repository (https://github.com/mllewis/IATLANG).

Description of variables

FILENAME: by_participant_df_tidy.csv

Each row corresponds to a participant. This dataframe is produced by analyses/study0/01_get_IAT_by_participant.R.

PARTICIPANT_PATH <- here("writeup/journal/data_for_pre_review/by_participant_df_tidy.csv")
iat_behavioral_es_participant <- read_csv(PARTICIPANT_PATH)

glimpse(iat_behavioral_es_participant)

## Observations: 657,335
## Variables: 9
## $ country_code                        <chr> "US", "US", "US", "US", "US"…
## $ country_name                        <chr> "United States of America", …
## $ sex                                 <dbl> 1, 0, 0, 1, 1, 1, 0, 1, 0, 0…
## $ log_age                             <dbl> 3.135494, 3.044522, 3.332205…
## $ overall_iat_D_score                 <dbl> 0.22132003, 0.76769282, 0.45…
## $ order                               <dbl> 1, 2, 2, 1, 2, 2, 2, 2, 1, 1…
## $ explicit_dif                        <dbl> 3, 1, 0, 0, 0, 2, 4, 1, 1, 1…
## $ es_iat_sex_age_order_explicit_resid <dbl> 1.4016390, -0.5332046, -1.54…
## $ es_iat_sex_age_order_implicit_resid <dbl> -0.038736129, 0.324396577, -…

Participant demographics:

country_code - Two-letter country code of participant.
country_name - Human readable country name.
sex - Participant sex (1 = male; 0 = female).
log_age - Log age of participant

Dependent measures:

order - Block order (“1” = Male/Career paired first; “2” = Female/Career paired first).
overall_iat_D_score - Raw D-score on behavioral IAT task (larger values = stronger bias to associate men with career, and women with family)
explicit_dif - Difference in response for “How strongly do you associate the following with males and females?” between the words “career” and “family” on female (1) to male (7) Likert scale (“career” response minus the “family” response).
es_iat_sex_age_order_implicit_resid - Behavioral IAT bias (larger values = stronger bias to associate men with career, and women with family), with participant age, participant gender, and block order residualized out.
es_iat_sex_age_order_explicit_resid - Explicit bias (larger values = stronger bias to associate men with career, and women with family), with participant age, participant gender, and block order residualized out.

FILENAME: by_country_df_tidy.csv

Each row corresponds to a country. This dataframe is produced by analyses/study0/02_get_IAT_by_country.R.

COUNTRY_PATH <- here("writeup/journal/data_for_pre_review/by_country_df_tidy.csv")
iat_behavioral_es_country <- read_csv(COUNTRY_PATH)

glimpse(iat_behavioral_es_country)

## Observations: 39
## Variables: 7
## $ country_code                        <chr> "AE", "AF", "AR", "AT", "AU"…
## $ country_name                        <chr> "United Arab Emirates", "Afg…
## $ n_participants                      <dbl> 581, 863, 531, 715, 13537, 1…
## $ es_iat_sex_age_order_explicit_resid <dbl> 0.37042205, 0.04984486, -0.1…
## $ es_iat_sex_age_order_implicit_resid <dbl> -0.0254883120, -0.0203192081…
## $ median_country_age                  <dbl> 30.3, 18.8, 31.7, 44.0, 38.7…
## $ per_women_stem_2012_2017            <dbl> 18.175217, NA, 11.481790, 14…

Country information:

country_code - Two-letter country code.
country_name - Human readable country name.

Behavioral IAT variables:

n_participants - Number of participants in IAT data.
es_iat_sex_age_order_implicit_resid - Behavioral IAT bias (larger values = stronger bias to associate men with career, and women with family), with participant age, participant gender, and block order residualized out.
es_iat_sex_age_order_explicit_resid - Explicit bias (larger values = stronger bias to associate men with career, and women with family), with participant age, participant gender, and block order residualized out.

Additional country-level control variables:

median_country_age - Median country age from CIA world factbook.
per_women_stem_2012_2017 - Proportion women in STEM fields.

FILENAME: by_language_df_tidy.csv

Each row corresponds to a language. Language-level behavioral IAT measures come from analyses/study0/05_get_IAT_by_language.R. Language IAT measures (Study 1) come from scripts in analyses/study1b/. Occupation language bias measures (Study 2) come from scripts in analyses/study2b/.

LANGUAGE_PATH <- here("writeup/journal/data_for_pre_review/by_language_df_tidy.csv")

all_es_tidy <- read_csv(LANGUAGE_PATH)
glimpse(all_es_tidy)

## Observations: 25
## Variables: 13
## $ language_code                       <chr> "ar", "da", "de", "en", "es"…
## $ language_name                       <chr> "Arabic", "Danish", "German"…
## $ family                              <chr> "Afro-Asiatic", "Indo-Europe…
## $ n_participants                      <dbl> 581.000, 1036.000, 2498.333,…
## $ es_iat_sex_age_order_implicit_resid <dbl> -0.025488312, 0.027568505, 0…
## $ es_iat_sex_age_order_explicit_resid <dbl> 0.37042205, -0.01513054, 0.2…
## $ median_country_age                  <dbl> 30.30000, 42.20000, 44.50000…
## $ per_women_stem_2012_2017            <dbl> 18.175217, 12.480726, 12.456…
## $ lang_es_sub                         <dbl> -0.24156041, 1.20429532, 0.9…
## $ lang_es_wiki                        <dbl> 0.640110693, 0.852912632, 0.…
## $ mean_prop_distinct_occs             <dbl> 0.00000000, 0.36250000, 0.88…
## $ subt_occu_semantics_fm              <dbl> -0.011790771, 0.025935586, 0…
## $ wiki_occu_semantics_fm              <dbl> -2.017098e-02, 4.679907e-02,…

Language information:

language_code - Two-letter language code.
language_name - Human readable language name.
family - Language family from Ethnologue.

Behavioral IAT variables:

n_participants - Number of participants in IAT data from countries that speak this language
es_iat_sex_age_order_implicit_resid - Behavioral IAT bias (larger values = stronger bias to associate men with career, and women with family), with participant age, participant gender, and block order residualized out.
es_iat_sex_age_order_explicit_resid - Explicit bias (larger values = stronger bias to associate men with career, and women with family), with participant age, participant gender, and block order residualized out.

Additional country-level control variables:

median_country_age - Median country age from CIA world factbook.
per_women_stem_2012_2017 - Proportion women in STEM fields.

Study 1B variables:

lang_es_sub - Language IAT bias from subtitle-trained models (larger values = stronger bias to associate men with career, and women with family)
lang_es_wiki - Language IAT bias from Wikipedia-trained models (larger values = stronger bias to associate men with career, and women with family)

Study 2 variables:

mean_prop_distinct_occs - Proportion of gender-specific labels for set of words referring to occupations.
subt_occu_semantics_fm - Gender bias in language statistics for occupation terms, based on subtitle-trained models (larger value = stronger gender associations (female)).
wiki_occu_semantics_fm - Gender bias in language statistics for occupation terms, based on wikipedia-trained models (larger value = stronger gender associations (female)).

Data readme for Lewis & Lupyan (under review)

Molly Lewis

2019-04-10

Description of variables