Replication of Beard & Amir (2010, Cognitive Therapy and Research)

Author

Eric Martz (emartz@stanford.edu)

Published

December 15, 2025

Introduction

Project Proposal

I chose this study because I am relatively new to psychology research as I pivot from my MS in Computer Science toward preparing for grad school in clinical psychology. As a result, I wanted to replicate an experiment related to clinical psychology, which posed some challenges as there are relatively few studies that are single session and can be administered entirely online. “Interpretation in Social Anxiety: When Meaning Precedes Ambiguity” by Beard & Amir is one of few studies that meet these requirements while also aligning with my research interests in anxiety and depression. The paper explores how interpretation biases change depending on the level of self-reported anxiety individuals have.

The study procedure will begin by asking participants to complete three self-report measures: Social Phobia and Anxiety Inventory (SPAI; Turner et al. 1989), the State Trait Anxiety Inventory (STAI; Spielberger et al. 1970) and the Beck Depression Inventory (BDI; Beck and Steer 1987). They will then complete the Word Sentence Association Paradigm, which asks participants to stare at a fixation cross for 500 ms, read a priming word that is either benign or threat-related for 500 ms, read an ambiguous sentence, then upon finishing reading choose whether the priming word applied to the sentence or not. The reaction times, percentage of benign responses, and percentage of threat responses were measured.

One potential challenge of replication will be that the author only shared five example sentences and prime words, but I intend to reach out to the author to request the full set of word-sentence pairs. Another will be coding the interface used to administer the experiment, since I have never built an experiment before (although I hope my other coding experience will help).

Update: I was able to get in touch with the author to receive the original materials, that is, the words and sentences used in the paradigm. All trials now match exactly with the trials performed in the original experiment.

You can find the original paper here: https://pmc.ncbi.nlm.nih.gov/articles/PMC2792932/

You can find my GitHub repository here: https://github.com/psych251/beard2010

You can find the preregistration here: https://osf.io/eupbf/overview

Methods

Power Analysis

In the original paper, the primary between-group comparisons showed large effects (Cohen’s d = 0.75–0.94). The below table shows the key effect sizes for each comparison:

Statistical Power for Key Comparisons
Effect	Cohen's d	Original p	Power
Threat bias (SA > NAC)	0.84	< .005	84.4%
Benign bias (SA > NAC)	0.75	< .02	75.6%
RT reject threat	0.82	.006	82.6%
RT endorse benign	0.94	.002	91.4%

In order to determine the 80%, 90%, and 95% power to detect these effect sizes, we compute the following:

Effect	Cohen’s d	Target Power	n per Group	Total Analyzed	Total Recruited*
Benign bias	0.75	80%	29	58	108
Benign bias	0.75	90%	39	78	145
Benign bias	0.75	95%	48	96	178
RT reject threat	0.82	80%	25	50	93
RT reject threat	0.82	90%	33	66	123
RT reject threat	0.82	95%	40	80	149
Threat bias	0.84	80%	24	48	89
Threat bias	0.84	90%	31	62	115
Threat bias	0.84	95%	38	76	141
RT endorse benign	0.94	80%	19	38	71
RT endorse benign	0.94	90%	25	50	93
RT endorse benign	0.94	95%	31	62	115

With our sample size of 26 participants per group (N = 52), we achieve 77–96% power (average 84%) to detect these effects at α = .05, two-tailed. This sample size matches the original study and provides adequate power to detect effects within the observed range.

While power for the benign bias effect (77%) falls slightly below the conventional 80% threshold, this represents only a 3% reduction in detection probability. Given that: (1) the original study found this effect significant with identical sample size, (2) our average power across all key comparisons was 84%, (3) our primary hypothesis concerns threat bias (84% power) and (4) class budgetary restrictions prevent me from going any higher than this sample size, this should be appropriate and feasible for my replication purposes.

Planned Sample

I plan to recruit 86 participants to adhere to the methods reported in the paper which took the bottom 30% and top 30% of participants based on their score on the social anxiety scale. This would mean 26 participants in each group, with the middle 40% excluded. However, I expect the final sample size to be slightly smaller due to attrition from failed attention checks. Due to class budgetary restrictions, I cannot go above this number, so my analysis may be slightly more underpowered than the above estimates.

I plan to recruit 86 participants through Prolific. The sampling frame will consist of English-speaking adults aged 18 or older. No preselection will be applied based on social anxiety scores; participants will complete the Social Phobia Inventory (SPIN) as part of the study, and group assignment will be determined post-hoc by taking the top 30% (high social anxiety; SA group) and bottom 30% (low social anxiety; NAC group) of SPIN scores, with the middle 40% excluded from analysis.

Materials

From the original paper: “Word Sentence Association Paradigm We created 76 ambiguous sentences that described social situations (e.g., “People laugh after something you said”) and 34 sentences that described non-social situations (e.g., “Part of the building is blown up”).Footnote2 We selected two words for each ambiguous sentence: one that corresponded to a threat interpretation (e.g., “embarrassing” or “terrorist”) and one that corresponded to a benign interpretation (e.g., “funny” or “construction,” see “Appendix” for more examples). We then divided the word sentence pairs (220 total) into two sets of materials (A and B) to create two versions of the task. Sets were matched with respect to the types of situations depicted in the sentences (e.g., dating, work, performance). Within each set participants saw 55 ambiguous sentences: once paired with the threat interpretation prime (55 trials) and once with the benign interpretation prime (55 trials) for a total of 110 trials. No word–sentence pairs were repeated across sets, and the word–sentence pairs were presented in a different random order to each participant. We randomly assigned participants to each set (Set A: n = 31; Set B: n = 21).”

This procedure was followed precisely using the exact word-sentence pairs used in the original study.

Procedure

From the original paper: “Participants were assessed individually. They read and signed a consent form, provided basic demographic information, and completed the self-report measures (i.e., SPAI, STAI, BDI). Participants then completed the WSAP on a computer.

Word Sentence Association Paradigm Each WSAP trial comprised four steps. First, a fixation cross appeared on the computer screen for 500 ms. The fixation cross directed the participants’ attention toward the middle of the screen and alerted them that a trial was beginning. Second, a prime representing either a threat interpretation (e.g., “embarrassing”) or a benign interpretation (e.g., “funny”) appeared in the center of the computer screen for 500 ms. Third, an ambiguous sentence (e.g., “People laugh after something you said”) appeared and remained on the screen until participants pressed the space bar indicating that they finished reading the sentence. Finally, the computer prompted participants to press ‘#1’ on the number pad if they thought the word and sentence were related or to press ‘#3’ on the number pad if the word and sentence were not related (see Fig. 1). Participants then pressed the space bar, and the next trial began. All text appeared in black, 12 point font against a gray background.”

Key differences: Due to a lack of availability of the Social Phobia and Anxiety Inventory (STAI), I used the Social Phobia Inventory (SPIN) instead. Additionally, at the recommendation of the original author, Courtney Beard, I changed the response keys from 1 and 3 to the left and right arrows to more closely emulate the position and convenience of the number pad.

Analysis Plan

Following the original study, participants who scored in the middle 40% of social anxiety scores will be excluded. Reaction times will be excluded on a trial-by-trial basis if they fall below 50 ms or above 2000 ms.

In addition to these exclusions, I chose to add three attention checks to my trials. I will exclude participants who do not pass at least two of the three checks. Note that these were added after my Pilot B data was processed.

The key analyses I will conduct are

a 2 (Group: SA, NAC) × 2 (Valence) × 2 (Response type) × 2 (Sentence type) mixed ANOVA on reaction times, with follow-up ANOVAs for social vs. non-social sentences if the four-way interaction is significant;

a 2 (Group) × 2 (Valence) × 2 (Sentence type) mixed ANOVA on endorsement rates; and
independent-samples t-tests comparing bias scores between groups.

Differences from Original Study

The most significant deviation from the original study is the different social anxiety inventory used. Due to a lack of public access for the Social Phobia and Anxiety Inventory (STAI), I used the Social Phobia Inventory (SPIN) instead. Additionally, the experiment is conducted over Prolific rather than in person as in the original study. An attention check was added during the WSAP task to exclude participants were not paying attention. As a result, the sample may be slightly smaller, since while the same sample was recruited, this likely will decrease in size once participants who failed the attention checks are excluded. Lastly, after my Pilot B was completed, I received guidance from the original author to change the response keys from 1 and 3 to the left and right arrows for ease of use. I do not expect any of these changes to significantly affect the outcome of the study, except for being more underpowered if the sample size is substantially smaller.

Actual Sample

The final sample consisted of 84 participants, 2 of whom were excluded for failing the attention check. 25 participants were assigned the social anxiety group, and 26 were assigned the non-anxious control group.

Differences from pre-data collection methods plan

After Pilot B, I decided to change the response keys from 1 and 3 to left and right arrows. I also added attention checks. Otherwise, my methods remained the same.

Results

Data preparation

Sample Size and Exclusions
Stage	N	% of Total
Total recruited	84	100.0
Excluded: Incomplete data	0	0.0
Excluded: Attention checks failed	2	2.4
After quality exclusions	82	97.6
Excluded: Middle 40% SPIN scores	33	40.2
Final analyzed sample	51	60.7
SA group (high social anxiety)	25	49.0
NAC group (low social anxiety)	26	51.0

Demographics by Group
Characteristic	SA (n = 25)	NAC (n = 26)
Age Distribution
Age: 18-24	8 (32%)	2 (7.7%)
Age: 25-34	8 (32%)	10 (38.5%)
Age: 35-44	3 (12%)	9 (34.6%)
Age: 45-54	4 (16%)	4 (15.4%)
Age: 55-64	1 (4%)	0 (0%)
Age: 65+	1 (4%)	1 (3.8%)
Gender Distribution
Gender: Female	13 (52%)	16 (61.5%)
Gender: Male	12 (48%)	10 (38.5%)
Gender: Non-binary/Other	0 (0%)	0 (0%)
Gender: Prefer not to say	0 (0%)	0 (0%)

Confirmatory analysis

Table 1: Descriptive Statistics - Original vs Replication
Measure	Original SA M (SD)	Original NAC M (SD)	Replication SA M (SD)	Replication NAC M (SD)
Social threat endorsement (%)	59 (18)	30 (18)	64 (15)	55 (19)
Social benign endorsement (%)	52 (19)	71 (14)	66 (16)	70 (20)
Non-social threat endorsement (%)	59 (18)	49 (23)	74 (15)	70 (24)
Non-social benign endorsement (%)	64 (15)	71 (18)	73 (16)	72 (22)
RT: Endorse threat (social)	532 (163)	485 (186)	583 (239)	702 (260)
RT: Endorse benign (social)	657 (226)	460 (199)	593 (232)	637 (248)
RT: Reject threat (social)	626 (267)	447 (169)	653 (245)	746 (272)
RT: Reject benign (social)	577 (200)	507 (228)	634 (288)	771 (298)
Threat bias score (social)	94 (203)	-39 (101)	71 (162)	45 (182)
Benign bias score (social)	80 (181)	-47 (166)	-42 (189)	-130 (194)
RT: Endorse threat (non-social)	550 (183)	434 (121)	557 (234)	703 (233)
RT: Endorse benign (non-social)	538 (183)	477 (198)	551 (213)	672 (262)
RT: Reject threat (non-social)	571 (187)	470 (198)	568 (278)	741 (320)
RT: Reject benign (non-social)	662 (286)	513 (235)	657 (305)	762 (377)

Table 2: Key Statistical Effects - Original vs Replication
Effect	Original Test	p	Sig	Replication Test	p	Sig	Replicated?
Threat bias	t(50) = 2.98	< .005	***	t(49) = -0.55	0.587	ns	No
Benign bias	t(50) = 2.64	< .02	*	t(48) = -1.63	0.11	ns	No
Threat endorsement	F(1,50) = 34.26	< .001	***	F(1,49) = 3.55	0.0657	ns	No
Benign endorsement	F(1,50) = 16.64	< .001	***	F(1,49) = 0.37	0.547	ns	No
SA: Threat vs Benign endorsement	t(25) = 1.39	.16	ns	t(24) = -0.67	0.508	ns	Yes
NAC: Benign vs Threat endorsement	t(25) = -10.56	< .001	***	t(25) = -3.05	0.00533	**	Yes
* p < .05, p < .01, * p < .001, ns = not significant

Exploratory analyses

From the confirmatory analyses, we see that there was a much smaller difference in social threat endorsement between the two groups than in the original study. To better understand these results, we look into the effect of this group assignment split. First, we see whether using a continuous scale of SPIN scores and including all participants yields similar results.

We then look at reaction time data to see if the continuous scale reveals any different trends.

Since there was no significant effect of SPIN scores on threat and benign reaction time biases in the confirmatory analyses, we look to see if there were significant differences in reaction time by age.

Lastly, we look at the sample means of social anxiety scores to see if the sample mean was significantly more anxious than community norms.

Discussion

Summary of Replication Attempt

This experiment resulted in a partial replication of the original study’s findings. Consistent with the original findings, this replication found that while there were no significant differences in endorsement rates of the threat and benign associations for the social anxiety (SA) group, the non-anxious control (NAC) endorsed benign associations significantly more than threat associations. However, my attempt failed to replicate the significant within-group differences shown in the original result, namely that in comparison to the NAC group, the SA group was significantly more likely to endorse threat associations, less likely to endorse benign associations, faster to endorse social threat associations, and slower to reject social threat associations.

Commentary

In our exploratory analysis, we aimed to better understand why the between-group differences did not replicate. One surprising result from the confirmatory analysis was the high social threat endorsement rate from the NAC group, which was almost twice as high as in the original experiment. There are two likely reasons for this difference which were both methodological deviations from the original study. First, the original study assigned their groups based on community norms of the Social Phobia Anxiety Inventory such that the NAC was the group that scored lower than the 30th percentile on the SPAI and the SA group scored higher than th 70th percentile. Due to sample size constraints as well as the lack of available community norms for the public inventory used in the replication, groups were assigned based on within-sample percentiles. We see in our exploratory analysis that the sample mean anxiety scores were significantly higher than the community mean reported in a study of adolescents (Ranta et al., 2007). Considering the clinical cutoff for social anxiety in this study was found to be a score of 19 and the sample mean was 24, it seems that this sample was relatively high in social anxiety which explains the lack of significant differences between the bottom 30 percent and top 30 percent of the sample. Additionally, the original study only used university students as a convenience sample for the adult population. My replication including ages ranging from 18 to 75, which I suspected may have influenced the difference in reaction time-based findings. From our exploratory analyses, we see significant reaction time differences across age groups, which may affect the quality of the reaction time bias measurements.

Lastly, we used a continuous SPIN scale instead of the original design’s group assignments to see if there was a significant correlation when less underpowered. We see a marginally significant positive correlation between SPIN scores and social threat endorsement, but the other continuous plots show no correlation. This could suggest several potential reasons why the replication failed. We have established that the replication did not achieve the same variance in social anxiety levels across group. The replication may have needed higher power to see any significant correlation between SPIN scores and endorsement rates, since adding in the middle 40% in this continuous sample yielded results that were just short of significant (p = .054). However, it is more likely that there were implementation challenges that reduced the data quality. The original study was in-person while the replication was held on Prolific, making it potentially more vulnerable to key spamming and external distractions.

References

Beard, C., Amir, N. Interpretation in Social Anxiety: When Meaning Precedes Ambiguity. Cogn Ther Res 33, 406–415 (2009). https://doi.org/10.1007/s10608-009-9235-0

Ranta, K., Kaltiala-Heino, R., Rantanen, P., Tuomisto, M. T., & Marttunen, M. (2007). Screening social phobia in adolescents from general population: The validity of the Social Phobia Inventory (SPIN) against a clinical interview. European Psychiatry, 22(4), 244-251. https://doi.org/10.1016/j.eurpsy.2006.12.002

--- title: "Replication of Beard & Amir (2010, Cognitive Therapy and Research)" author: "Eric Martz (emartz@stanford.edu)" date: "`r format(Sys.time(), '%B %d, %Y')`" format: html: code-fold: true # Makes code collapsible code-tools: true # Adds show/hide code button execute: echo: false # Hide all code by default warning: false # Hide all warnings message: false # Hide all messages ---  ## Introduction **Project Proposal** I chose this study because I am relatively new to psychology research as I pivot from my MS in Computer Science toward preparing for grad school in clinical psychology. As a result, I wanted to replicate an experiment related to clinical psychology, which posed some challenges as there are relatively few studies that are single session and can be administered entirely online. "Interpretation in Social Anxiety: When Meaning Precedes Ambiguity" by Beard & Amir is one of few studies that meet these requirements while also aligning with my research interests in anxiety and depression. The paper explores how interpretation biases change depending on the level of self-reported anxiety individuals have. The study procedure will begin by asking participants to complete three self-report measures: Social Phobia and Anxiety Inventory (SPAI; Turner et al. 1989), the State Trait Anxiety Inventory (STAI; Spielberger et al. 1970) and the Beck Depression Inventory (BDI; Beck and Steer 1987). They will then complete the Word Sentence Association Paradigm, which asks participants to stare at a fixation cross for 500 ms, read a priming word that is either benign or threat-related for 500 ms, read an ambiguous sentence, then upon finishing reading choose whether the priming word applied to the sentence or not. The reaction times, percentage of benign responses, and percentage of threat responses were measured. One potential challenge of replication will be that the author only shared five example sentences and prime words, but I intend to reach out to the author to request the full set of word-sentence pairs. Another will be coding the interface used to administer the experiment, since I have never built an experiment before (although I hope my other coding experience will help). Update: I was able to get in touch with the author to receive the original materials, that is, the words and sentences used in the paradigm. All trials now match exactly with the trials performed in the original experiment. You can find the original paper here: https://pmc.ncbi.nlm.nih.gov/articles/PMC2792932/ You can find my GitHub repository here: https://github.com/psych251/beard2010 You can find the preregistration here: https://osf.io/eupbf/overview --- ## Methods ### Power Analysis In the original paper, the primary between-group comparisons showed large effects (Cohen's d = 0.75–0.94). The below table shows the key effect sizes for each comparison: ```{r} library(pwr) library(knitr) library(kableExtra) n_per_group <- 26 total_analyzed <- 52 total_recruited <- 86 excluded <- 34 # Sample size summary table sample_summary <- data.frame( Description = c("Total recruited", "Excluded (middle 40% SPIN)", "Final analyzed", "Per group"), N = c(total_recruited, excluded, total_analyzed, n_per_group) ) # Original effect sizes from Beard & Amir (2009) effects <- data.frame( Effect = c("Threat bias (SA > NAC)", "Benign bias (SA > NAC)", "RT reject threat", "RT endorse benign"), Original_d = c(0.84, 0.75, 0.82, 0.94), Original_p = c("< .005", "< .02", ".006", ".002") ) # Calculate power for each effect with n=26 effects$Power_achieved <- sapply(effects$Original_d, function(d) { pwr.t.test( n = n_per_group, d = d, sig.level = 0.05, type = "two.sample" )$power }) effects$Power_pct <- paste0(round(effects$Power_achieved * 100, 1), "%") kable(effects[, c("Effect", "Original_d", "Original_p", "Power_pct")], caption = "Statistical Power for Key Comparisons", col.names = c("Effect", "Cohen's d", "Original p", "Power"), row.names = FALSE, align = c("l", "c", "c", "c")) %>% kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) ``` In order to determine the 80%, 90%, and 95% power to detect these effect sizes, we compute the following: ```{r} library(pwr) library(knitr) # Effect sizes from original study d_values <- c(0.75, 0.82, 0.84, 0.94) effect_names <- c("Benign bias", "RT reject threat", "Threat bias", "RT endorse benign") # Power levels to test power_levels <- c(0.80, 0.90, 0.95) # Create results table results <- data.frame() for (i in 1:length(d_values)) { for (power in power_levels) { pwr_result <- pwr.t.test( d = d_values[i], sig.level = 0.05, power = power, type = "two.sample" ) n_per_group <- ceiling(pwr_result$n) total_analyzed <- n_per_group * 2 total_recruited <- ceiling(total_analyzed / 0.54) # Account for 46% exclusion rate results <- rbind(results, data.frame( Effect = effect_names[i], d = d_values[i], Power = paste0(power * 100, "%"), n_per_group = n_per_group, Total_analyzed = total_analyzed, Total_recruited = total_recruited )) } } kable(results, col.names = c("Effect", "Cohen's d", "Target Power", "n per Group", "Total Analyzed", "Total Recruited*"), row.names = FALSE) ``` With our sample size of 26 participants per group (N = 52), we achieve 77–96% power (average 84%) to detect these effects at α = .05, two-tailed. This sample size matches the original study and provides adequate power to detect effects within the observed range. While power for the benign bias effect (77%) falls slightly below the conventional 80% threshold, this represents only a 3% reduction in detection probability. Given that: (1) the original study found this effect significant with identical sample size, (2) our average power across all key comparisons was 84%, (3) our primary hypothesis concerns threat bias (84% power) and (4) class budgetary restrictions prevent me from going any higher than this sample size, this should be appropriate and feasible for my replication purposes. ### Planned Sample I plan to recruit 86 participants to adhere to the methods reported in the paper which took the bottom 30% and top 30% of participants based on their score on the social anxiety scale. This would mean 26 participants in each group, with the middle 40% excluded. However, I expect the final sample size to be slightly smaller due to attrition from failed attention checks. Due to class budgetary restrictions, I cannot go above this number, so my analysis may be slightly more underpowered than the above estimates. I plan to recruit 86 participants through Prolific. The sampling frame will consist of English-speaking adults aged 18 or older. No preselection will be applied based on social anxiety scores; participants will complete the Social Phobia Inventory (SPIN) as part of the study, and group assignment will be determined post-hoc by taking the top 30% (high social anxiety; SA group) and bottom 30% (low social anxiety; NAC group) of SPIN scores, with the middle 40% excluded from analysis. ### Materials From the original paper: "Word Sentence Association Paradigm We created 76 ambiguous sentences that described social situations (e.g., “People laugh after something you said”) and 34 sentences that described non-social situations (e.g., “Part of the building is blown up”).Footnote2 We selected two words for each ambiguous sentence: one that corresponded to a threat interpretation (e.g., “embarrassing” or “terrorist”) and one that corresponded to a benign interpretation (e.g., “funny” or “construction,” see “Appendix” for more examples). We then divided the word sentence pairs (220 total) into two sets of materials (A and B) to create two versions of the task. Sets were matched with respect to the types of situations depicted in the sentences (e.g., dating, work, performance). Within each set participants saw 55 ambiguous sentences: once paired with the threat interpretation prime (55 trials) and once with the benign interpretation prime (55 trials) for a total of 110 trials. No word–sentence pairs were repeated across sets, and the word–sentence pairs were presented in a different random order to each participant. We randomly assigned participants to each set (Set A: n = 31; Set B: n = 21)." This procedure was followed precisely using the exact word-sentence pairs used in the original study. ### Procedure From the original paper: "Participants were assessed individually. They read and signed a consent form, provided basic demographic information, and completed the self-report measures (i.e., SPAI, STAI, BDI). Participants then completed the WSAP on a computer. Word Sentence Association Paradigm Each WSAP trial comprised four steps. First, a fixation cross appeared on the computer screen for 500 ms. The fixation cross directed the participants’ attention toward the middle of the screen and alerted them that a trial was beginning. Second, a prime representing either a threat interpretation (e.g., “embarrassing”) or a benign interpretation (e.g., “funny”) appeared in the center of the computer screen for 500 ms. Third, an ambiguous sentence (e.g., “People laugh after something you said”) appeared and remained on the screen until participants pressed the space bar indicating that they finished reading the sentence. Finally, the computer prompted participants to press ‘#1’ on the number pad if they thought the word and sentence were related or to press ‘#3’ on the number pad if the word and sentence were not related (see Fig. 1). Participants then pressed the space bar, and the next trial began. All text appeared in black, 12 point font against a gray background." Key differences: Due to a lack of availability of the Social Phobia and Anxiety Inventory (STAI), I used the Social Phobia Inventory (SPIN) instead. Additionally, at the recommendation of the original author, Courtney Beard, I changed the response keys from 1 and 3 to the left and right arrows to more closely emulate the position and convenience of the number pad. ### Analysis Plan Following the original study, participants who scored in the middle 40% of social anxiety scores will be excluded. Reaction times will be excluded on a trial-by-trial basis if they fall below 50 ms or above 2000 ms. In addition to these exclusions, I chose to add three attention checks to my trials. I will exclude participants who do not pass at least two of the three checks. Note that these were added after my Pilot B data was processed. The key analyses I will conduct are 1) a 2 (Group: SA, NAC) × 2 (Valence) × 2 (Response type) × 2 (Sentence type) mixed ANOVA on reaction times, with follow-up ANOVAs for social vs. non-social sentences if the four-way interaction is significant; (2) a 2 (Group) × 2 (Valence) × 2 (Sentence type) mixed ANOVA on endorsement rates; and (3) independent-samples t-tests comparing bias scores between groups. ### Differences from Original Study The most significant deviation from the original study is the different social anxiety inventory used. Due to a lack of public access for the Social Phobia and Anxiety Inventory (STAI), I used the Social Phobia Inventory (SPIN) instead. Additionally, the experiment is conducted over Prolific rather than in person as in the original study. An attention check was added during the WSAP task to exclude participants were not paying attention. As a result, the sample may be slightly smaller, since while the same sample was recruited, this likely will decrease in size once participants who failed the attention checks are excluded. Lastly, after my Pilot B was completed, I received guidance from the original author to change the response keys from 1 and 3 to the left and right arrows for ease of use. I do not expect any of these changes to significantly affect the outcome of the study, except for being more underpowered if the sample size is substantially smaller. #### Actual Sample The final sample consisted of 84 participants, 2 of whom were excluded for failing the attention check. 25 participants were assigned the social anxiety group, and 26 were assigned the non-anxious control group. #### Differences from pre-data collection methods plan  After Pilot B, I decided to change the response keys from 1 and 3 to left and right arrows. I also added attention checks. Otherwise, my methods remained the same. ## Results ### Data preparation ```{r} ### Data Preparation #### Load Relevant Libraries and Functions library(tidyverse) library(ez) library(lsr) library(knitr) library(kableExtra) # Configuration Settings DATA_FOLDER <- "../data/" WSAP_REFERENCE <- "../paradigm/WSAP.csv" # Reference file with word-sentence pairs # RT cleaning RT_MIN <- 50 # Minimum RT in ms RT_MAX <- 2000 # Maximum RT in ms #### Import data # Load WSAP reference file wsap_reference <- read_csv(WSAP_REFERENCE, show_col_types = FALSE) %>% rename( set = Test, valence = `Trial type`, word = Word, sentence = Sentence, domain = Domain ) %>% mutate( valence = tolower(valence), valence = if_else(valence == "negative", "threat", "benign"), sentence_type = case_when( domain == "sad" ~ "social", domain == "gad" ~ "non_social", TRUE ~ NA_character_ ), word = str_trim(word), sentence = str_trim(sentence) ) %>% select(set, word, sentence, valence, sentence_type) # Load participant data csv_files <- list.files(DATA_FOLDER, pattern = "\\.csv$", full.names = TRUE) df_list <- lapply(csv_files, function(file) { df <- read_csv(file, show_col_types = FALSE, col_types = cols(.default = "c")) if (!"subject" %in% names(df)) { df$subject <- tools::file_path_sans_ext(basename(file)) } return(df) }) raw_data <- bind_rows(df_list) raw_data <- raw_data %>% mutate( across(c(spin_total, stai_state_total, stai_trait_total, bdi_total, trial_index, time_elapsed), ~suppressWarnings(as.numeric(.))), across(c(word, sentence, domain, response_key, response, subject, trial_part), as.character) ) raw_data <- raw_data %>% group_by(subject) %>% fill(age, gender, race_ethnicity, education, native_english, spin_total, stai_state_total, stai_trait_total, bdi_total, .direction = "down") %>% ungroup() #### Data exclusion / filtering # Clean WSAP trials wsap_clean <- raw_data %>% filter(task == "wsap") %>% mutate( word = str_trim(as.character(word)), sentence = str_trim(as.character(sentence)), judgment_rt_clean = as.numeric(rt) ) %>% left_join( wsap_reference %>% mutate( word = str_trim(word), sentence = str_trim(sentence) ) %>% select(word, sentence, valence, sentence_type), by = c("word", "sentence") ) %>% mutate( response_type = case_when( response_key == "1" ~ "endorsement", response_key == "3" ~ "rejection", tolower(as.character(response)) == "arrowleft" ~ "endorsement", tolower(as.character(response)) == "arrowright" ~ "rejection", TRUE ~ NA_character_ ), endorsed = case_when( response_key == "1" ~ 1, tolower(as.character(response)) == "arrowleft" ~ 1, response_key == "3" ~ 0, tolower(as.character(response)) == "arrowright" ~ 0, TRUE ~ NA_real_ ), rt_clean = case_when( judgment_rt_clean >= RT_MIN & judgment_rt_clean <= RT_MAX ~ judgment_rt_clean, TRUE ~ NA_real_ ) ) %>% filter(!is.na(valence), !is.na(sentence_type), !is.na(response_type)) %>% select(subject, trial_index, word, sentence, domain, valence, sentence_type, response_type, endorsed, judgment_rt = judgment_rt_clean, rt_clean) # Exclude participants with incomplete data valid_subjects <- wsap_clean %>% group_by(subject) %>% summarise(n_trials = n(), .groups = "drop") %>% filter(n_trials >= 50) %>% pull(subject) # Extract participant-level info INCLUDING attention check data participant_info <- raw_data %>% group_by(subject) %>% summarise( set_assigned = first(na.omit(set_assigned)), spin_total = first(na.omit(spin_total)), stai_state_total = first(na.omit(stai_state_total)), stai_trait_total = first(na.omit(stai_trait_total)), bdi_total = first(na.omit(bdi_total)), age = first(na.omit(age)), gender = first(na.omit(gender)), race_ethnicity = first(na.omit(race_ethnicity)), education = first(na.omit(education)), native_english = first(na.omit(native_english)), # Get attention check data from participant-level columns attention_passed = first(na.omit(as.numeric(attention_checks_passed))), attention_total = first(na.omit(as.numeric(attention_checks_total))), .groups = "drop" ) %>% filter(subject %in% valid_subjects) # Filter out participants who failed attention check failed_attention <- participant_info %>% filter(attention_passed == 0) %>% pull(subject) # Update participant_info and wsap_clean to only include valid subjects participant_info <- participant_info %>% filter(subject %in% valid_subjects) wsap_clean <- wsap_clean %>% filter(subject %in% valid_subjects) percentile_30 <- quantile(participant_info$spin_total, 0.30, na.rm = TRUE) percentile_70 <- quantile(participant_info$spin_total, 0.70, na.rm = TRUE) participant_info <- participant_info %>% mutate( group = case_when( spin_total >= percentile_70 ~ "SA", spin_total <= percentile_30 ~ "NAC", TRUE ~ "exclude" ) ) ### Sample Characteristics and Exclusions #### Sample Size and Exclusions # Total recruited n_recruited <- n_distinct(raw_data$subject) # Exclusions n_incomplete <- length(setdiff(unique(raw_data$subject), valid_subjects)) # Attention check failures (if applicable) n_attention_failed <- if(exists("failed_attention")) length(failed_attention) else 0 # Middle 40% excluded based on SPIN n_middle_excluded <- sum(participant_info$group == "exclude") # Final sample n_sa <- sum(participant_info$group == "SA") n_nac <- sum(participant_info$group == "NAC") n_analyzed <- n_sa + n_nac # Create exclusion summary table exclusion_summary <- data.frame( Stage = c("Total recruited", "Excluded: Incomplete data", "Excluded: Attention checks failed", "After quality exclusions", "Excluded: Middle 40% SPIN scores", "Final analyzed sample", " SA group (high social anxiety)", " NAC group (low social anxiety)"), N = c(n_recruited, n_incomplete, n_attention_failed, n_recruited - n_incomplete - n_attention_failed, n_middle_excluded, n_analyzed, n_sa, n_nac), Percentage = c(100, round(100 * n_incomplete / n_recruited, 1), round(100 * n_attention_failed / n_recruited, 1), round(100 * (n_recruited - n_incomplete - n_attention_failed) / n_recruited, 1), round(100 * n_middle_excluded / (n_recruited - n_incomplete - n_attention_failed), 1), round(100 * n_analyzed / n_recruited, 1), round(100 * n_sa / n_analyzed, 1), round(100 * n_nac / n_analyzed, 1)) ) kable(exclusion_summary, caption = "Sample Size and Exclusions", col.names = c("Stage", "N", "% of Total"), align = c("l", "c", "c")) %>% kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) #### Demographics by Group # Calculate demographics demo_summary <- participant_info %>% filter(group != "exclude") %>% # Only analyze SA and NAC groups group_by(group) %>% summarise( n = n(), # Age distribution age_18_24 = sum(age == "18-24", na.rm = TRUE), age_25_34 = sum(age == "25-34", na.rm = TRUE), age_35_44 = sum(age == "35-44", na.rm = TRUE), age_45_54 = sum(age == "45-54", na.rm = TRUE), age_55_64 = sum(age == "55-64", na.rm = TRUE), age_65_plus = sum(age == "65 or older", na.rm = TRUE), # Gender distribution female = sum(gender == "Female", na.rm = TRUE), male = sum(gender == "Male", na.rm = TRUE), nonbinary_other = sum(gender %in% c("Non-binary", "Other"), na.rm = TRUE), prefer_not_say = sum(gender == "Prefer not to say", na.rm = TRUE), .groups = "drop" ) # Convert to percentages demo_pct <- demo_summary %>% mutate( across(starts_with("age_"), ~ paste0(.x, " (", round(100 * .x / n, 1), "%)")), across(c(female, male, nonbinary_other, prefer_not_say), ~ paste0(.x, " (", round(100 * .x / n, 1), "%)")) ) %>% select(-n) demo_table <- data.frame( Characteristic = c("Age: 18-24", "Age: 25-34", "Age: 35-44", "Age: 45-54", "Age: 55-64", "Age: 65+","Gender: Female", "Gender: Male", "Gender: Non-binary/Other", "Gender: Prefer not to say"), SA = c(demo_pct$age_18_24[demo_pct$group == "SA"], demo_pct$age_25_34[demo_pct$group == "SA"], demo_pct$age_35_44[demo_pct$group == "SA"], demo_pct$age_45_54[demo_pct$group == "SA"], demo_pct$age_55_64[demo_pct$group == "SA"], demo_pct$age_65_plus[demo_pct$group == "SA"], demo_pct$female[demo_pct$group == "SA"], demo_pct$male[demo_pct$group == "SA"], demo_pct$nonbinary_other[demo_pct$group == "SA"], demo_pct$prefer_not_say[demo_pct$group == "SA"]), NAC = c(demo_pct$age_18_24[demo_pct$group == "NAC"], demo_pct$age_25_34[demo_pct$group == "NAC"], demo_pct$age_35_44[demo_pct$group == "NAC"], demo_pct$age_45_54[demo_pct$group == "NAC"], demo_pct$age_55_64[demo_pct$group == "NAC"], demo_pct$age_65_plus[demo_pct$group == "NAC"], demo_pct$female[demo_pct$group == "NAC"], demo_pct$male[demo_pct$group == "NAC"], demo_pct$nonbinary_other[demo_pct$group == "NAC"], demo_pct$prefer_not_say[demo_pct$group == "NAC"]) ) kable(demo_table, caption = "Demographics by Group", col.names = c("Characteristic", paste0("SA (n = ", n_sa, ")"), paste0("NAC (n = ", n_nac, ")")), align = c("l", "c", "c")) %>% kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>% pack_rows("Age Distribution", 1, 6) %>% pack_rows("Gender Distribution", 7, 10) ``` ### Confirmatory analysis ```{r} # Calculate endorsement rates endorsement_rates <- wsap_clean %>% group_by(subject, sentence_type, valence) %>% summarise( pct_endorsed = mean(endorsed, na.rm = TRUE) * 100, n_trials = n(), .groups = "drop" ) %>% select(-n_trials) %>% # REMOVE n_trials before pivoting! pivot_wider( names_from = c(sentence_type, valence), values_from = pct_endorsed, names_prefix = "pct_" ) # Calculate reaction times rt_means <- wsap_clean %>% filter(!is.na(rt_clean)) %>% group_by(subject, sentence_type, valence, response_type) %>% summarise( mean_rt = mean(rt_clean, na.rm = TRUE), n_trials = n(), .groups = "drop" ) %>% select(-n_trials) %>% # KEY FIX: Remove n_trials before pivoting pivot_wider( names_from = c(sentence_type, valence, response_type), values_from = mean_rt, names_prefix = "rt_" ) # Calculate bias scores bias_scores <- rt_means %>% mutate( # Threat bias = RT(reject threat) - RT(endorse threat) threat_bias_social = rt_social_threat_rejection - rt_social_threat_endorsement, benign_bias_social = rt_social_benign_endorsement - rt_social_benign_rejection, threat_bias_nonsocial = rt_non_social_threat_rejection - rt_non_social_threat_endorsement, benign_bias_nonsocial = rt_non_social_benign_endorsement - rt_non_social_benign_rejection ) %>% select(subject, contains("bias")) # Combine all participant data participant_data <- participant_info %>% left_join(endorsement_rates, by = "subject") %>% left_join(rt_means, by = "subject") %>% left_join(bias_scores, by = "subject") # Filter to analysis groups only analysis_data <- participant_data %>% filter(group %in% c("SA", "NAC")) # Prepare long format for ANOVAs anova_data_long <- wsap_clean %>% left_join(participant_info %>% select(subject, group, spin_total), by = "subject") %>% filter(group %in% c("SA", "NAC")) %>% select(subject, group, spin_total, sentence_type, valence, response_type, rt_clean, endorsed, trial_index) %>% mutate( group = factor(group, levels = c("NAC", "SA")), sentence_type = factor(sentence_type, levels = c("social", "non_social")), valence = factor(valence, levels = c("benign", "threat")), response_type = factor(response_type, levels = c("endorsement", "rejection")) ) completeness_check <- anova_data_long %>% filter(!is.na(rt_clean)) %>% group_by(subject, sentence_type, valence, response_type) %>% summarise(n = n(), .groups = "drop") %>% complete(subject, sentence_type, valence, response_type, fill = list(n = 0)) # Find participants with missing cells (0 trials in any condition) incomplete_subjects <- completeness_check %>% filter(n == 0) %>% pull(subject) %>% unique() # Option 1: Exclude participants with incomplete data anova_data_complete <- anova_data_long %>% filter(!subject %in% incomplete_subjects, !is.na(rt_clean)) #### Primary Mixed ANOVA: Reaction Times # Group × Valence × Response Type × Sentence Type anova_rt <- ezANOVA( data = anova_data_complete, # Use complete data dv = rt_clean, wid = subject, within = .(valence, response_type, sentence_type), between = group, type = 3, detailed = TRUE ) #### Follow-up ANOVA: Social Sentences Only anova_rt_social <- ezANOVA( data = anova_data_complete %>% filter(sentence_type == "social"), dv = rt_clean, wid = subject, within = .(valence, response_type), between = group, type = 3, detailed = TRUE ) #### Independent t-tests: Bias Scores # Threat bias threat_bias_test <- t.test(threat_bias_social ~ group, data = analysis_data) threat_bias_d <- cohensD(threat_bias_social ~ group, data = analysis_data) # Benign bias benign_bias_test <- t.test(benign_bias_social ~ group, data = analysis_data) benign_bias_d <- cohensD(benign_bias_social ~ group, data = analysis_data) ``` ```{r} ### Descriptive Statistics: Original vs Replication # Original study values (from Table 2) original_descriptives <- data.frame( Measure = c( # Self-report indices "Social threat endorsement (%)", "Social benign endorsement (%)", "Non-social threat endorsement (%)", "Non-social benign endorsement (%)", # Social sentences RT "RT: Endorse threat (social)", "RT: Endorse benign (social)", "RT: Reject threat (social)", "RT: Reject benign (social)", "Threat bias score (social)", "Benign bias score (social)", # Non-social sentences RT "RT: Endorse threat (non-social)", "RT: Endorse benign (non-social)", "RT: Reject threat (non-social)", "RT: Reject benign (non-social)" ), Original_SA_M = c(59, 52, 59, 64, 532, 657, 626, 577, 94, 80, 550, 538, 571, 662), Original_SA_SD = c(18, 19, 18, 15, 163, 226, 267, 200, 203, 181, 183, 183, 187, 286), Original_NAC_M = c(30, 71, 49, 71, 485, 460, 447, 507, -39, -47, 434, 477, 470, 513), Original_NAC_SD = c(18, 14, 23, 18, 186, 199, 169, 228, 101, 166, 121, 198, 198, 235) ) # Calculate replication values replication_descriptives <- data.frame( Measure = c( # Self-report indices "Social threat endorsement (%)", "Social benign endorsement (%)", "Non-social threat endorsement (%)", "Non-social benign endorsement (%)", # Social sentences RT "RT: Endorse threat (social)", "RT: Endorse benign (social)", "RT: Reject threat (social)", "RT: Reject benign (social)", "Threat bias score (social)", "Benign bias score (social)", # Non-social sentences RT "RT: Endorse threat (non-social)", "RT: Endorse benign (non-social)", "RT: Reject threat (non-social)", "RT: Reject benign (non-social)" ), Replication_SA_M = c( mean(analysis_data$pct_social_threat[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$pct_social_benign[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$pct_non_social_threat[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$pct_non_social_benign[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_social_threat_endorsement[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_social_benign_endorsement[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_social_threat_rejection[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_social_benign_rejection[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$threat_bias_social[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$benign_bias_social[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_non_social_threat_endorsement[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_non_social_benign_endorsement[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_non_social_threat_rejection[analysis_data$group == "SA"], na.rm = TRUE), mean(analysis_data$rt_non_social_benign_rejection[analysis_data$group == "SA"], na.rm = TRUE) ), Replication_SA_SD = c( sd(analysis_data$pct_social_threat[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$pct_social_benign[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$pct_non_social_threat[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$pct_non_social_benign[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_social_threat_endorsement[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_social_benign_endorsement[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_social_threat_rejection[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_social_benign_rejection[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$threat_bias_social[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$benign_bias_social[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_non_social_threat_endorsement[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_non_social_benign_endorsement[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_non_social_threat_rejection[analysis_data$group == "SA"], na.rm = TRUE), sd(analysis_data$rt_non_social_benign_rejection[analysis_data$group == "SA"], na.rm = TRUE) ), Replication_NAC_M = c( mean(analysis_data$pct_social_threat[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$pct_social_benign[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$pct_non_social_threat[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$pct_non_social_benign[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_social_threat_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_social_benign_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_social_threat_rejection[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_social_benign_rejection[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$threat_bias_social[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$benign_bias_social[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_non_social_threat_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_non_social_benign_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_non_social_threat_rejection[analysis_data$group == "NAC"], na.rm = TRUE), mean(analysis_data$rt_non_social_benign_rejection[analysis_data$group == "NAC"], na.rm = TRUE) ), Replication_NAC_SD = c( sd(analysis_data$pct_social_threat[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$pct_social_benign[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$pct_non_social_threat[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$pct_non_social_benign[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_social_threat_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_social_benign_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_social_threat_rejection[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_social_benign_rejection[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$threat_bias_social[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$benign_bias_social[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_non_social_threat_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_non_social_benign_endorsement[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_non_social_threat_rejection[analysis_data$group == "NAC"], na.rm = TRUE), sd(analysis_data$rt_non_social_benign_rejection[analysis_data$group == "NAC"], na.rm = TRUE) ) ) # Combine tables combined_descriptives <- original_descriptives %>% left_join(replication_descriptives, by = "Measure") %>% mutate( Original_SA = sprintf("%.0f (%.0f)", Original_SA_M, Original_SA_SD), Original_NAC = sprintf("%.0f (%.0f)", Original_NAC_M, Original_NAC_SD), Replication_SA = sprintf("%.0f (%.0f)", Replication_SA_M, Replication_SA_SD), Replication_NAC = sprintf("%.0f (%.0f)", Replication_NAC_M, Replication_NAC_SD) ) %>% select(Measure, Original_SA, Original_NAC, Replication_SA, Replication_NAC) kable(combined_descriptives, caption = "Table 1: Descriptive Statistics - Original vs Replication", col.names = c("Measure", "Original SA M (SD)", "Original NAC M (SD)", "Replication SA M (SD)", "Replication NAC M (SD)"), align = c("l", "c", "c", "c", "c")) %>% kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) #### Independent tests: Endorsement Rates # Threat endorsement: SA > NAC threat_endorse_aov <- aov(pct_social_threat ~ group, data = analysis_data) threat_endorse_summary <- summary(threat_endorse_aov) threat_endorse_f <- threat_endorse_summary[[1]]$`F value`[1] threat_endorse_df1 <- threat_endorse_summary[[1]]$Df[1] threat_endorse_df2 <- threat_endorse_summary[[1]]$Df[2] threat_endorse_p <- threat_endorse_summary[[1]]$`Pr(>F)`[1] # Benign endorsement: NAC > SA benign_endorse_aov <- aov(pct_social_benign ~ group, data = analysis_data) benign_endorse_summary <- summary(benign_endorse_aov) benign_endorse_f <- benign_endorse_summary[[1]]$`F value`[1] benign_endorse_df1 <- benign_endorse_summary[[1]]$Df[1] benign_endorse_df2 <- benign_endorse_summary[[1]]$Df[2] benign_endorse_p <- benign_endorse_summary[[1]]$`Pr(>F)`[1] #### Independent t-tests: Bias Scores (keep as t-tests like original) # Threat bias threat_bias_test <- t.test(threat_bias_social ~ group, data = analysis_data) # Benign bias benign_bias_test <- t.test(benign_bias_social ~ group, data = analysis_data) # SA within-group: threat vs benign endorsement sa_within <- t.test(analysis_data$pct_social_threat[analysis_data$group == "SA"], analysis_data$pct_social_benign[analysis_data$group == "SA"], paired = TRUE) # NAC within-group: threat vs benign endorsement nac_within <- t.test(analysis_data$pct_social_threat[analysis_data$group == "NAC"], analysis_data$pct_social_benign[analysis_data$group == "NAC"], paired = TRUE) # Create comparison table with proper direction checking effects_comparison <- data.frame( Effect = c( "Threat bias", "Benign bias", "Threat endorsement", "Benign endorsement", "SA: Threat vs Benign endorsement", "NAC: Benign vs Threat endorsement" ), Original_Statistic = c("t(50) = 2.98", "t(50) = 2.64", "F(1,50) = 34.26", "F(1,50) = 16.64", "t(25) = 1.39", "t(25) = -10.56"), Original_p = c("< .005", "< .02", "< .001", "< .001", ".16", "< .001"), Original_sig = c("***", "*", "***", "***", "ns", "***"), Replication_Statistic = c( sprintf("t(%.0f) = %.2f", threat_bias_test$parameter, threat_bias_test$statistic), sprintf("t(%.0f) = %.2f", benign_bias_test$parameter, benign_bias_test$statistic), sprintf("F(%d,%d) = %.2f", threat_endorse_df1, threat_endorse_df2, threat_endorse_f), sprintf("F(%d,%d) = %.2f", benign_endorse_df1, benign_endorse_df2, benign_endorse_f), sprintf("t(%.0f) = %.2f", sa_within$parameter, sa_within$statistic), sprintf("t(%.0f) = %.2f", nac_within$parameter, nac_within$statistic) ), Replication_p = c( format.pval(threat_bias_test$p.value, digits = 3), format.pval(benign_bias_test$p.value, digits = 3), format.pval(threat_endorse_p, digits = 3), format.pval(benign_endorse_p, digits = 3), format.pval(sa_within$p.value, digits = 3), format.pval(nac_within$p.value, digits = 3) ), Replication_sig = c( ifelse(threat_bias_test$p.value < 0.001, "***", ifelse(threat_bias_test$p.value < 0.01, "**", ifelse(threat_bias_test$p.value < 0.05, "*", "ns"))), ifelse(benign_bias_test$p.value < 0.001, "***", ifelse(benign_bias_test$p.value < 0.01, "**", ifelse(benign_bias_test$p.value < 0.05, "*", "ns"))), ifelse(threat_endorse_p < 0.001, "***", ifelse(threat_endorse_p < 0.01, "**", ifelse(threat_endorse_p < 0.05, "*", "ns"))), ifelse(benign_endorse_p < 0.001, "***", ifelse(benign_endorse_p < 0.01, "**", ifelse(benign_endorse_p < 0.05, "*", "ns"))), ifelse(sa_within$p.value < 0.001, "***", ifelse(sa_within$p.value < 0.01, "**", ifelse(sa_within$p.value < 0.05, "*", "ns"))), ifelse(nac_within$p.value < 0.001, "***", ifelse(nac_within$p.value < 0.01, "**", ifelse(nac_within$p.value < 0.05, "*", "ns"))) ), Replicated = c( { sa_threat <- mean(analysis_data$threat_bias_social[analysis_data$group == "SA"], na.rm=TRUE) nac_threat <- mean(analysis_data$threat_bias_social[analysis_data$group == "NAC"], na.rm=TRUE) if(threat_bias_test$p.value < 0.05 && sa_threat > nac_threat && sa_threat > 0 && nac_threat < 0) { "Yes" } else if(threat_bias_test$p.value < 0.05 && sa_threat > nac_threat) { "Partial" } else { "No" } }, { sa_benign <- mean(analysis_data$benign_bias_social[analysis_data$group == "SA"], na.rm=TRUE) nac_benign <- mean(analysis_data$benign_bias_social[analysis_data$group == "NAC"], na.rm=TRUE) # Check if pattern matches: SA should be positive (lacking benign bias), NAC negative (having it) if(benign_bias_test$p.value < 0.05 && sa_benign > 0 && nac_benign < 0) { "Yes" } else { "No" } }, { sa_threat_end <- mean(analysis_data$pct_social_threat[analysis_data$group == "SA"], na.rm=TRUE) nac_threat_end <- mean(analysis_data$pct_social_threat[analysis_data$group == "NAC"], na.rm=TRUE) if(threat_endorse_p < 0.05 && sa_threat_end > nac_threat_end) { "Yes" } else { "No" } }, { sa_benign_end <- mean(analysis_data$pct_social_benign[analysis_data$group == "SA"], na.rm=TRUE) nac_benign_end <- mean(analysis_data$pct_social_benign[analysis_data$group == "NAC"], na.rm=TRUE) if(benign_endorse_p < 0.05 && nac_benign_end > sa_benign_end) { "Yes" } else { "No" } }, ifelse(sa_within$p.value >= 0.05, "Yes", "No"), ifelse(nac_within$p.value < 0.05 && nac_within$statistic < 0, "Yes", "No") ) ) kable(effects_comparison, caption = "Table 2: Key Statistical Effects - Original vs Replication", col.names = c("Effect", "Original Test", "p", "Sig", "Replication Test", "p", "Sig", "Replicated?"), align = c("l", "c", "c", "c", "c", "c", "c", "c")) %>% kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>% add_footnote(c("* p < .05, ** p < .01, *** p < .001, ns = not significant"), notation = "none") write.csv(effects_comparison, "table2_effects.csv", row.names = FALSE) ``` ### Exploratory analyses From the confirmatory analyses, we see that there was a much smaller difference in social threat endorsement between the two groups than in the original study. To better understand these results, we look into the effect of this group assignment split. First, we see whether using a continuous scale of SPIN scores and including all participants yields similar results. ```{r} library(ggplot2) # Calculate correlation cor_value <- cor.test(participant_data$spin_total, participant_data$pct_social_threat) ggplot(data = participant_data, aes(x = spin_total, y = pct_social_threat)) + geom_point(alpha = 0.6, size = 3, color = "steelblue") + geom_smooth(method = "lm", color = "darkblue", fill = "lightblue", size = 1.2) + labs( title = "Social Anxiety and Social Threat Perception", x = "SPIN Total Score", y = "Percent Social Threat Endorsement (%)", caption = sprintf("r = %.2f, p %s", cor_value$estimate, ifelse(cor_value$p.value < .001, "< .001", paste("=", round(cor_value$p.value, 3)))) ) + theme_classic() + theme( plot.title = element_text(face = "bold", size = 14), axis.title = element_text(size = 12), panel.grid.minor = element_blank() ) cor_benign <- cor.test(participant_data$spin_total, participant_data$pct_social_benign) ggplot(data = participant_data, aes(x = spin_total, y = pct_social_benign)) + geom_point(alpha = 0.6, size = 3, color = "coral") + geom_smooth(method = "lm", color = "darkred", fill = "lightcoral", size = 1.2) + labs( title = "Social Anxiety and Social Benign Perception", x = "SPIN Total Score", y = "Percent Endorsement (%)", caption = sprintf("r = %.2f, p %s", cor_benign$estimate, ifelse(cor_benign$p.value < .001, "< .001", paste("=", round(cor_benign$p.value, 3)))) ) + theme_classic() + theme( plot.title = element_text(face = "bold", size = 14), axis.title = element_text(size = 12), panel.grid.minor = element_blank() ) ``` We then look at reaction time data to see if the continuous scale reveals any different trends. ```{r} cor_value2 <- cor.test(participant_data$spin_total, participant_data$threat_bias_social) ggplot(data = participant_data, aes(x = spin_total, y = threat_bias_social)) + geom_point(alpha = 0.6, size = 3, color = "coral") + geom_smooth(method = "lm", color = "darkred", fill = "lightcoral", size = 1.2) + labs( title = "Social Anxiety and Threat Bias", x = "SPIN Total Score", y = "Social Threat Bias", caption = sprintf("r = %.2f, p %s", cor_value2$estimate, ifelse(cor_value2$p.value < .001, "< .001", paste("=", round(cor_value2$p.value, 3)))) ) + theme_classic() + theme( plot.title = element_text(face = "bold", size = 14), axis.title = element_text(size = 12), panel.grid.minor = element_blank() ) cor_value2 <- cor.test(participant_data$spin_total, participant_data$benign_bias_social) ggplot(data = participant_data, aes(x = spin_total, y = benign_bias_social)) + geom_point(alpha = 0.6, size = 3, color = "coral") + geom_smooth(method = "lm", color = "darkred", fill = "lightcoral", size = 1.2) + labs( title = "Social Anxiety and Benign Bias", x = "SPIN Total Score", y = "Social Benign Bias", caption = sprintf("r = %.2f, p %s", cor_value2$estimate, ifelse(cor_value2$p.value < .001, "< .001", paste("=", round(cor_value2$p.value, 3)))) ) + theme_classic() + theme( plot.title = element_text(face = "bold", size = 14), axis.title = element_text(size = 12), panel.grid.minor = element_blank() ) ``` Since there was no significant effect of SPIN scores on threat and benign reaction time biases in the confirmatory analyses, we look to see if there were significant differences in reaction time by age. ```{r} library(dplyr) summary_data <- participant_data %>% group_by(age) %>% summarize( mean_rt = mean(rt_social_threat_endorsement, na.rm = TRUE), se_rt = sd(rt_social_threat_endorsement, na.rm = TRUE) / sqrt(n()), n = n() ) ggplot(summary_data, aes(x = age, y = mean_rt)) + geom_col(fill = "steelblue", alpha = 0.7, width = 0.8) + geom_errorbar(aes(ymin = mean_rt - se_rt, ymax = mean_rt + se_rt), width = 0.3, size = 0.8) + geom_smooth(method = "lm", se = FALSE, color = "darkred", linetype = "dashed", size = 1) + geom_text(aes(label = paste0("n=", n)), vjust = -1.5, size = 3.5) + labs( title = "Response Time to Social Threat by Age", x = "Age (years)", y = "Mean Response Time (ms)", caption = "Error bars show ±SE" ) + theme_classic() + theme( plot.title = element_text(face = "bold", size = 14), axis.title = element_text(size = 12), panel.grid.minor = element_blank() ) + scale_y_continuous(expand = expansion(mult = c(0, 0.15))) # Add space at top for labels ``` Lastly, we look at the sample means of social anxiety scores to see if the sample mean was significantly more anxious than community norms. ```{r} community_mean <- 11.3 # Basic one-sample t-test t_result <- t.test(participant_data$spin_total, mu = community_mean) sample_mean <- t_result$estimate # Mean of x ci_lower <- t_result$conf.int[1] # Lower bound of 95% CI ci_upper <- t_result$conf.int[2] # Upper bound of 95% CI sample_n <- t_result$parameter + 1 means_df <- data.frame( group = c("Sample", "Community"), mean = c(sample_mean, community_mean), ci_lower = c(ci_lower, community_mean), ci_upper = c(ci_upper, community_mean) ) ggplot() + # Background: individual points (faded) geom_jitter(data=participant_data, aes(x=1, y=spin_total), width=0.15, alpha=0.15, color="steelblue", size=2) + # Sample mean with 95% CI geom_errorbar(data=filter(means_df, group=="Sample"), aes(x=1, y=mean, ymin=ci_lower, ymax=ci_upper), color="darkblue", width=0.15, size=1.5) + geom_point(data=filter(means_df, group=="Sample"), aes(x=1, y=mean), color="darkblue", size=6, shape=18) + geom_text(data=filter(means_df, group=="Sample"), aes(x=1, y=mean, label=round(mean, 1)), hjust=-0.8, size=5, fontface="bold") + # Community mean (no CI) geom_point(data=filter(means_df, group=="Community"), aes(x=1, y=mean), color="red", size=6, shape=18) + geom_text(data=filter(means_df, group=="Community"), aes(x=1, y=mean, label=round(mean, 1)), hjust=-0.8, size=5, fontface="bold", color="red") + # Add legend manually annotate("point", x=0.7, y=48, color="darkblue", size=5, shape=18) + annotate("text", x=0.75, y=48, label="Sample Mean (95% CI)", hjust=0, size=4) + annotate("point", x=0.7, y=45, color="red", size=5, shape=18) + annotate("text", x=0.75, y=45, label="Community Mean", hjust=0, size=4) + scale_x_continuous(limits=c(0.6, 1.4), breaks=1, labels="") + labs(title="Sample Anxiety Scores vs. Community Mean", subtitle=paste0("N = ", sample_n), y="SPIN Total Score", x="") + theme_minimal() + theme(axis.text.x = element_blank(), panel.grid.major.x = element_blank()) ``` ## Discussion ### Summary of Replication Attempt This experiment resulted in a partial replication of the original study's findings. Consistent with the original findings, this replication found that while there were no significant differences in endorsement rates of the threat and benign associations for the social anxiety (SA) group, the non-anxious control (NAC) endorsed benign associations significantly more than threat associations. However, my attempt failed to replicate the significant within-group differences shown in the original result, namely that in comparison to the NAC group, the SA group was significantly more likely to endorse threat associations, less likely to endorse benign associations, faster to endorse social threat associations, and slower to reject social threat associations. ### Commentary In our exploratory analysis, we aimed to better understand why the between-group differences did not replicate. One surprising result from the confirmatory analysis was the high social threat endorsement rate from the NAC group, which was almost twice as high as in the original experiment. There are two likely reasons for this difference which were both methodological deviations from the original study. First, the original study assigned their groups based on community norms of the Social Phobia Anxiety Inventory such that the NAC was the group that scored lower than the 30th percentile on the SPAI and the SA group scored higher than th 70th percentile. Due to sample size constraints as well as the lack of available community norms for the public inventory used in the replication, groups were assigned based on within-sample percentiles. We see in our exploratory analysis that the sample mean anxiety scores were significantly higher than the community mean reported in a study of adolescents (Ranta et al., 2007). Considering the clinical cutoff for social anxiety in this study was found to be a score of 19 and the sample mean was 24, it seems that this sample was relatively high in social anxiety which explains the lack of significant differences between the bottom 30 percent and top 30 percent of the sample. Additionally, the original study only used university students as a convenience sample for the adult population. My replication including ages ranging from 18 to 75, which I suspected may have influenced the difference in reaction time-based findings. From our exploratory analyses, we see significant reaction time differences across age groups, which may affect the quality of the reaction time bias measurements. Lastly, we used a continuous SPIN scale instead of the original design's group assignments to see if there was a significant correlation when less underpowered. We see a marginally significant positive correlation between SPIN scores and social threat endorsement, but the other continuous plots show no correlation. This could suggest several potential reasons why the replication failed. We have established that the replication did not achieve the same variance in social anxiety levels across group. The replication may have needed higher power to see any significant correlation between SPIN scores and endorsement rates, since adding in the middle 40% in this continuous sample yielded results that were just short of significant (p = .054). However, it is more likely that there were implementation challenges that reduced the data quality. The original study was in-person while the replication was held on Prolific, making it potentially more vulnerable to key spamming and external distractions. ### References Beard, C., Amir, N. Interpretation in Social Anxiety: When Meaning Precedes Ambiguity. Cogn Ther Res 33, 406–415 (2009). https://doi.org/10.1007/s10608-009-9235-0 Ranta, K., Kaltiala-Heino, R., Rantanen, P., Tuomisto, M. T., & Marttunen, M. (2007). Screening social phobia in adolescents from general population: The validity of the Social Phobia Inventory (SPIN) against a clinical interview. European Psychiatry, 22(4), 244-251. https://doi.org/10.1016/j.eurpsy.2006.12.002