Original paper

Github repository

Paradigm

Introduction

This paper by Semrow et al. (2019) studies how membership in different identity categories interact. In this case the researchers find that Asian Americans who are gay are perceived as more American than straight Asian Americans. My primary research is on categories and membership of organizations and individuals with regards to various identity groups and the status attained by membership in different groups. This work interests me In that it displays how membership in one identity group (LGBTQ) influences membership in another group (American), and is useful to my field in that membership in either group affects individual’s status and labor market outcomes. This particular study’s materials do not explicitly state that the targets are Asian or Asian American. Other studies within the original paper examined Asian identity explicitly, this one examines foreign identity more broadly and thus this study may have more broad implications than others within the paper.

The key result of interest in this paper was the following interaction effect. Target individuals who were gay and whose country of origin was less accepting of gay people than the U.S. were seen as more American (relative to the condition where the country of origin was equally accepting) than straight targets whose country of origin was less accepting relative (again relative to the condition where the country of origin was equally accepting).

A 2 (sexual orientation: gay, straight; between) by􏰃 2 (country of origin: equally accepting, less accepting; within) mixed-model ANOVA on American identity revealed… a significant interaction [between target’s sexuality and country acceptance], F(1, 98) = 22.39, p < .001 (page 6).

I attempt to replicate this finding, but I must note that my replication differs from the original regarding its participants. I have recruited Amazon Mechanical Turkers instead of university students.

Methods

Power Analysis

The planned test is an F test for the interaction between target sexuality and country acceptance. I determined that 50 participants ought to be recruited. However I will note that the corresponding author of the original paper mentioned that my proposed sample size may be under powered. I justified my proposed sample with the following analysis. Again this is to replicate the interaction effect:

A 2 (sexual orientation: gay, straight; between) by􏰃 2 (country of origin: equally accepting, less accepting; within) mixed-model ANOVA on American identity revealed… a significant interaction [between target’s sexuality and country acceptance], F(1, 98) = 22.39, p < .001 (page 6).

The parameters of interest are described as follows. I chose \(\alpha =0.05\) and power of .8. The estimated results for less and equally accepting countries for a gay target are 4.82 and 4.03 respectively, whereas for a straight target the measures are 3.89 and 4.09 respectively. These estimates are for the measures of perceived Americanness - measured as the mean of three Likert style questions described below in more detail. Regarding the parameters for variance of measures and correlation across measures, I presumed that my participants would have greater variation and so I conducted analysis presuming the variance of the result for a measure is one and the correlation across measures is 0.55 (the correlation of within subject responses to the two conditions).

Planned Sample

This is a 2 by 2 study. A total sample of 52 is the planned size (26 per condition). Slightly more than 50 should be recruited in case some fail to complete the survey. Each participant will be presented with either a straight target or a gay target and will receive two conditions regarding the level of acceptance in the country of origin. So within participants the country of origin acceptance is varying and between participants the target’s sexuality varies.

The desired sample will consist of Americans using Amazon’s Mechanical Turk platform. Although the original study participants were University of Washington students.

Materials

The materials as follows will be followed as precisely as possible. Here are the Orignal Materials.

“Participants [will be] presented with a hypothetical country either named Boden or Thamen. They were told either that gay people are less welcome and accepted in Boden/Thamen than in the United States, or that they are equally welcome and accepted in Boden/Thamen and the United States. Then, they read about a gay man (“X is a gay man”) or straight man (“X is a straight man”). Participants rated the target’s American identity on three questions: “How likely is it that this person is Bodenian/Thamenian or American?,” “How Bodenian/Thamenian or American is this person?,” and “To what extent do you believe this person identifies as Bodenian/Thamenian or American?” on a scale from 1 (very [likely] Bodenian/Thamenian) to 7 (very [likely] American)…[Then] They read about another target with the same sexual orientation (“Y is a gay/straight man”) [but changed country and homophobia condition] and answered the same three questions about his American identity” (pages 5-6).

This will be conducted using a Qualtrics survey with MTurk participants. On one page of the survey participants will be presented with relevant information regarding the hypothetical country and the target individual, then on the next page of the survey this information is repeated and the three Likert scale questions are presented. Then on the following page participants will be presented with manipulation checks asking them to identify the acceptance of the country-of-origin relative to the US(“How accepting is [Target Country] of gay people compared to the U.S.”) on a scale of from 1 (Much less accepting) to 7 (Much More accepting); and the target’s sexuality with a multiple choice question: “What was the sexual orientation of the person you read about?” (with responses of Gay, Straight, or Not Specified.

Procedure/Stimuli and Challenges

Results may be confounded by the order of the country of origin conditions. Additionally participants may quickly learn about the purpose of the experiment upon being exposed to the second condition. Country names may also confound. Further confounds may also be introduced by the name of the target person. So I have taken care to randomize and fully counterbalance the presentation of the country of origin acceptance conditions, country names and target names.

Furthermore participants on Mturk may not be paying sufficient attention and fail manipulation checks. However in the original study many participants failed at least one manipulation check, this was because participants incorrectly reported the straight target’s sexuality as being “not specified”.

Analysis Plan

I plan to conduct to conduct a 2 by 2 ANOVA as follows to yield an estimate of the study’s main result:

A 2 (sexual orientation: gay, straight; between) by􏰃 2 (country of origin: equally accepting, less accepting; within) mixed-model ANOVA on American identity revealed… a significant interaction [between target’s sexuality and country acceptance], F(1, 98) = 22.39, p < .001 (page 6).

Note that the dependent variable is the mean response on the 1-7 question scale for each of the three questions. I hope to replicate the findings such that a significant p-value is yielded from the F test specified above.

I will analyze the results excluding those results for participants who fail a corresponding manipulation check. A manipulation check is regarded as a failure if the equally accepting country of origin is regarded as less accepting to the U.S. or the less accepting country of origin is regarded as equally or more accepting than the U.S. Additionally failure will occur if the target’s sexuality is incorrectly recalled.

I also plan to conduct additional analysis of the demographics of the participants I recruited. I will not examine how responses differ by key demographics, rather I am interested in noticing if the replication study’s demographics differ significantly in some respects (e.g. gender identity, race, etc.) from the original study’s demographics.

Differences from Original Study

The primary difference from the original study is the sample. This sample will consist of American mechanical turkers rather than students at a major American research university. This may make a notable difference as undergraduate students at the University of Washington should be expected to have different views of other countries, ethnicity, and sexuality than the broader American population. Additionally my sample will be smaller. Only very slight variations have been made to the materials to account for the fact that this survey is being conducted online rather than physically with paper. These slight changes to the materials have been made in consultation with the original study’s corresponding author.

Methods Addendum (Post Data Collection)

Actual Sample

The actual sample appeared as hoped. Luckily everyone passed every attention check, so the full sample of 52 participants may be used for each condition they experienced (104 in total). The demographics skewed in favor of about three quarters men and only about 5% Asian / Asian American, which notably differs from the original study’s demographics and which I further note below.

Differences from pre-data collection methods plan

None

Results

Data preparation

For data preparation it is first of all key to exclude participants who fail the manipulation checks as specified earlier. I will note that the only data preparation done before hand was changing the name of the variables for the results to “Condition_1_Results, Manipulation_1_Check”, etc. in Excel as they were not named properly upon importing the data in R otherwise.

The data analyzed below is from four participants recruited from a pilot.

####Load Relevant Libraries and Functions
library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
df<-read.csv('Final_Data_Restricted.csv')
#drop first row of irrelevant duplicated information.
df<-df[-c(1),]

##Keep relevant information for tests, demographics the manipulations, the measurements, and the manipulation checks
dfRelevant<-(df %>%
  select(Country1Homophobia, Sexuality,Condition_1_Results,Condition_2_Results, Manipulation_1_Check, Manipulation_2_Check,Sexuality_Check_1,Sexuality_Check_2, Gender, Race,ResponseId) )

##Transform into long format
data_long <- gather(dfRelevant, condition, Americanness, Condition_1_Results:Condition_2_Results, factor_key=TRUE)
## Warning: attributes are not identical across measure variables;
## they will be dropped
#Now we only kept the first conditions homophobia since then  we know that the second is opposite.
#So create a variable that corresponds to condition's value for homophobia. In this long format, right now the condition variable is coded as "Condition_1_Results" or "Condition_2_Results"
data_long$condition<-as.character(data_long$condition)
#code such that the less accepting or homophobic country of origin condition is one and 0 otherwise
data_long$CountryAcceptance<-0
##The current condition's country is less accepting if we are on the first condition and it is homophobic or the second condition and the first country is not homophobic (since it varies within participant)
data_long$CountryAcceptance[(data_long$condition=='Condition_1_Results' & data_long$Country1Homophobia==T) | (data_long$condition=='Condition_2_Results' & data_long$Country1Homophobia==F)]<-1


##Collapse the Country and Sexuality Checks into one relevant variable to match the condition.
data_long$CountryManipulationCheck<-as.numeric(as.character(data_long$Manipulation_1_Check))
data_long$CountryManipulationCheck[data_long$condition=='Condition_2_Results']<-as.numeric(as.character(data_long$Manipulation_2_Check[data_long$condition=='Condition_2_Results']))


data_long$SexualityCheck<-data_long$Sexuality_Check_1
data_long$SexualityCheck[data_long$condition=='Condition_2_Results']<-data_long$Sexuality_Check_2[data_long$condition=='Condition_2_Results']

##Now keep only our relevant variables: The dependent variable (Americanness), Sexuality, CountryAcceptance, CountryManipulationCheck, SexualityCheck and demographics
data_long<-(data_long %>%
  select(Americanness, Sexuality,CountryAcceptance,CountryManipulationCheck, SexualityCheck,Gender, Race, ResponseId) )

#Now create variable if particpant  passed checks for country of origin acceptance condition, initalize to 0
data_long$PassedCountryCheck<-0

#now for equally accpeting countries you should answer 4 or above, recall CountryAcceptance value of 0 corresponds to equally accepting and 1 as less
data_long$PassedCountryCheck[as.numeric( as.character( data_long$CountryManipulationCheck)) * (1-data_long$CountryAcceptance) >=4 & data_long$CountryAcceptance == 0] <- 1

#For less accepting countries the answer should be less than 4
data_long$PassedCountryCheck[as.numeric( as.character( data_long$CountryManipulationCheck))  < 4 & data_long$CountryAcceptance == 1] <- 1


##Code the sexuality checks as passing if the response equals the target's sexuality
#Initialize to zero
data_long$SexualityCheckPassed<-0
data_long$SexualityCheckPassed[data_long$Sexuality == data_long$Sexuality]<-1



##now we can code if  both checks were passed by multiplying together since both ought to be one
data_long$PassedAllChecks<-0
data_long$PassedAllChecks<-data_long$PassedCountryCheck*data_long$SexualityCheckPassed




###Some Final House-keeping
##Divide Americanness by 3 in order to get the average as the variable was originally summed
data_long$Americanness<-as.numeric(data_long$Americanness)/3


#create new dataframe of those who pased checks to analyze them
data_long_passed<-data_long[data_long$PassedAllChecks==1,]

Confirmatory analysis

aov.1<- aov(Americanness~Sexuality*CountryAcceptance, data=data_long)
summary(aov.1)
##                              Df Sum Sq Mean Sq F value Pr(>F)  
## Sexuality                     1   2.57   2.565   2.205 0.1407  
## CountryAcceptance             1   7.72   7.719   6.634 0.0115 *
## Sexuality:CountryAcceptance   1   3.23   3.232   2.778 0.0987 .
## Residuals                   100 116.35   1.164                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##Note slight change below from earlier pre-registration as I did not make it a mixed-model, which I corrected for by including
#"Error(ResponseId/(CountryAcceptance))" into the code.
aov.ErrorWithin <- aov(Americanness~Sexuality*CountryAcceptance  +Sexuality+ Error(ResponseId/(CountryAcceptance)), data=data_long)
summary(aov.ErrorWithin)
## 
## Error: ResponseId
##           Df Sum Sq Mean Sq F value Pr(>F)
## Sexuality  1   2.57   2.565   1.952  0.169
## Residuals 50  65.69   1.314               
## 
## Error: ResponseId:CountryAcceptance
##                             Df Sum Sq Mean Sq F value  Pr(>F)   
## CountryAcceptance            1   7.72   7.719   7.618 0.00805 **
## Sexuality:CountryAcceptance  1   3.23   3.232   3.190 0.08017 . 
## Residuals                   50  50.66   1.013                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##now plot the Results

#First convert Country Acceptance Variable to a factor
data_long_passed$Sexuality<-as.factor(data_long_passed$Sexuality)
data_long_passed$CountryAcceptance<-as.factor(data_long_passed$CountryAcceptance)

#Now plot
ggplot(data_long_passed, aes(Sexuality, Americanness, fill = CountryAcceptance)) +
  stat_summary(geom = "bar", fun.y = mean, position = "dodge") +
  stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge") +
  xlab('Target Sexuality') + ylab('Perceived American Identity (1-7 Scale)')+
  scale_fill_discrete(name = "Accepting vs U.S.", labels = c("Equally", "Less"))

Exploratory analyses

Further Analysis will be done to compare the demographic makeup of m-turkers as compared to the original study. Additionally I examine the standard deviations of the response measurements by condition to explore why my results may not have been as significant.

##examine standard deviations
sd(data_long_passed$Americanness[data_long_passed$Sexuality=='gay' & data_long_passed$CountryAcceptance=='0'])
## [1] 0.7863146
sd(data_long_passed$Americanness[data_long_passed$Sexuality=='gay' & data_long_passed$CountryAcceptance=='1'])
## [1] 1.479623
sd(data_long_passed$Americanness[data_long_passed$Sexuality=='straight'& data_long_passed$CountryAcceptance=='0'])
## [1] 0.7244021
sd(data_long_passed$Americanness[data_long_passed$Sexuality=='straight' & data_long_passed$CountryAcceptance=='1'])
## [1] 1.125463
#Get the proportions of White and Asian / Asian Americans in the data
sum(df$Race=='White')
## [1] 41
sum(df$Race=='Asian / Asian American')
## [1] 3
#Get the number of man and woman identifiers
sum(df$Gender=='Man')
## [1] 37
sum(df$Gender=='Woman')
## [1] 15
sd(data_long_passed$Americanness[data_long_passed$Sexuality=='gay' & data_long_passed$CountryAcceptance=='0'])
## [1] 0.7863146
sd(data_long_passed$Americanness[data_long_passed$Sexuality=='gay' & data_long_passed$CountryAcceptance=='1'])
## [1] 1.479623

Discussion

My results here are very similar to the original study. The estimated interaction effect is in the same direction and of similar magnitude. We can see that their figure 3 (page 6) is remarkably similar to the plot I generated. My result however is only significant at p<0.1 (vlaue of ~0.08).

Notably though however we can see that the standard deviations in responses in my study were higher. For example, for gay individuals in the homophobic country condition the standard deviation was 1.48, higher than the original study’s estimate of 1.08.

Additionally we notice that my sample consisted of many more men and white Americans, but fewer Asian individuals relative to the original study. My sample consisted of 37 men and 15 women (original had 53 women and 45 men). I also sampled 41 White individuals and only 3 Asian / Asian American identifying individuals (the original study had 38 Asian/Asian American and 36 White participants).

Summary of Replication Attempt

Overall I conclude that the primary result of this study did replicate - even though it was not as statistically significant. I would expect that this failure to achieve p <0.05 arose from not having a large enough sample size due to it being potentially underpowered arising from incorrectly specified parameters for the power analysis. As it was noticed that the standard deviations (variance) of responses were higher in my study potentially arising from a different demographic being sampled. Though I had tried to take this into account in the power analysis by supposing the variances/standard deviation in responses for the MTurkers.

Commentary

I will note that the original authors’ did reccomend a larger sample size. The differing results here (in terms of p-values) seem to be resulting from the higher variance of response in my sampled population. Indeed my population did significantly differ demographically from theirs in terms of race and gender, but notably the original study sampled college students.