Replication of Repeat after us: Syntactic alignment is not partner-specific by Rachel Ostrand and V.S. Ferreira (2019, Psychological Science)
Author
A.J. Schwartz (avschwartz@ucsd.edu)
Published
December 9, 2024
Introduction
This experiment was chosen based on my current research interest in conversational alignment in psycholinguistics. In short, the experiment tests whether or not speakers align syntactically to their specific conversation partner as compared with a more general alignment towards what they have recently been exposed to. My past research, while being psycholinguistic in nature, was not focused on conversation, but moreseo on learning. This project will hopefully encourage my reading and exploration on syntactic alignment. This specific paper was produced partially by Dr. Ferreira’s lab, which I am working in, so I will have access to people who are familiar with the data and the background of the study. Additionally, I have very little familiarity with analyzing psycholinguistic data, in my undergrad the experiment I conducted required more analysis of cognitive test results. I hope to do the computational reproducibility project over the replication project, because I believe that hands on experience with data analysis is more important to my studies than learning something like jspsych, especially because much of my experimentation happens in person- and because I have access to data that is very much like what I would want to be collecting for this experiment, without having to take the many hours it takes to transcribe recording data.
While there is access to data, I have no access to the code used to run the analyses or create the figures. For this project, I hope to reproduce the findings of experiment 4 and 5, which I consider to be the most important experiments from the paper. Experiment 4 eliminates some of the issues in the previous 3 and demonstrates a lack of partner-specific alignment. Experiment 5 takes that even further and shows a lack of partner alignment even when the partner may not understand other syntactical structures. I hope to produce a 4 (Syntax Exposure Condition: all-Prepositional Dative, speaker-specific, mixed, all-Double Object) × 2 (Listener: Experimenter A, Experimenter B) GLMM model for experiment for with a sub-model to test each of the single structure alignments against each other and a sub-model to test the speaker-specific and mixed conditions against each other. I will be doing the same analysis for experiment five as well. Each model will contain a visualization, including the sub-models, which do not have visualizations in the paper that this will be based off. I hope to gain some experience with ggplot- I have a strong background in design, and an interest in data visualization generally, but no hands on experience with it. The main challenge will be working with linguistic instead of numerical data, and figuring out how to translate transcribed sentences into data. I also would like to take some time to make the visualizations not only legible, but visually appealing. In the past, I’ve relied on programs like JASP and Jamovi to analyze my data for me, but I think it’ll be important to learn how to do that in base R considering the limitations of software like those.
Methods
Power Analysis
Power analysis could not be done. Analysis was done through a GLMM with no reported effect size.
Planned Sample
Experiment Four
96 participants were taken from the subject pool from the University of California, San Diego. Participants were excluded if they were part of previous experiments in the same study. Participants were required to be native and monolinguial speakers of English.
Experiment Five
96 participants were taken from the subject pool from the University of California, San Diego. Participants were excluded if they were part of previous experiments in the same study. Participants were required to be native and monolinguial speakers of English.
Materials
“The stimuli were 96 colored pictures … consisting of 72 unique dative pictures and 24 unique intransitive pictures. As before, the dative events could be described using either a prepositional dative (PD) or double object (DO) structure. For item counterbalancing purposes, the dative pictures were divided into three item sets of 24. For a particular participant, Experimenter A described one set of dative pictures, Experimenter B described a different set, and the participant described the third set.1 Across participants, the items sets were counterbalanced among the three speakers in the experiment, so that a particular picture was described by Experimenter A for 1/3 of participants, described by Experimenter B for 1/3 of participants, and described by the participant for 1/3 of participants.”
“In certain conditions, the intransitive pictures were described by the experimenters as filler items, to hold constant the number of exposure sentences of a given dative structure across the conditions … Additionally, all participants described the intransitive pictures interleaved with the critical dative pictures, to reduce the influence of self-priming from one sentence to the next. The intransitive pictures had a simple event structure (e.g., “The woman is sleeping”) which made it unlikely that participants would produce a sentence containing even a single object, so as not to prime one of the dative structures more than the other.”
Procedure
Overall Procedure
“Participants were told they were playing a conversational picture-matching game with the experimenters. One participant and one experimenter sat across a table from each other, separated by an opaque barrier that was high enough to block the other’s table space but low enough to easily see each other’s face and upper body. Each partner had a series of pictures; the task throughout the experiment was for one partner to describe the pictures to the other partner, who put his/her own pictures in the same order.
Each participant interacted with two experimenters, one at a time. Both experimenters were female. A round began when the first experimenter entered the participant’s room and described six distinct pictures to the participant (two each of transitive, locative, and dative events), while the participant arranged his own cards in the order described (Exposure Phase A). The first experimenter gave the participant two 2-digit multiplication problems to complete and left the room. The purpose of the math problems was to provide a cover task to allow the experimenter to leave; performance on the math problems was not measured. After 30 seconds, the second experimenter entered the participant’s room, collected the math problems, and described a new set of six distinct pictures to the participant (Exposure Phase B). The second experimenter then gave the participant a new pair of math problems and left the room. Finally, one experimenter returned, this time as the listener, and laid out the participant’s pictures in a predetermined and pseudo-random order such that two pictures of the same event type would not be described consecutively. The participant described all 12 pictures that he had just heard (both experimenters’ full set of six) to the listening experimenter (Test Phase). Thus, for each picture the participant described, the listening experimenter was either the same or different person as the experimenter who had originally described that picture to the participant. This process comprised a complete round (Experimenter A described six unique pictures, Experimenter B described six unique pictures, then the participant described the same 12 pictures), and occurred for four rounds, each containing different pictures. Each experimenter described a total of 24 distinct pictures, and the participant described all 48 pictures over the course of the experiment.
All factors (including nuisance factors) were fully counterbalanced either (or both) within or between participants. The order of the two experimenters during each exposure phase, and identity of the experimenter during each test phase, was counterbalanced across rounds for a given participant and also across participants. That is, each participant listened to Experimenter A and then Experimenter B during two exposure rounds, and Experimenter B and then Experimenter A during the other two exposure rounds. Similarly, each participant described his pictures to Experimenter A for two test rounds and to Experimenter B for the other two test rounds. Additionally, picture-structure mapping was counterbalanced across participants, such that half of participants heard (e.g.) Fig. 1 described using a DO, and the other half heard it described using a PD. The order in which each experimenter described her pictures, and the order in which participants described their pictures, was also counterbalanced across participants.
One experimenter described each transitive picture using an active sentence, each locative picture using a with-locative sentence, and each dative picture using a double object sentence. The other experimenter produced only passive, on-locative, and prepositional dative sentences. The identity of the experimenter who had each syntactic preference was also counterbalanced between participants, such that half of the participants heard actives, with-locatives, and DOs from “Hannah”, and passives, on-locatives, and PDs from “Victoria”, and the other half of participants heard the reversed mapping. At the start of the experiment, participants were told there were multiple experimenters running the experiment at the same time, and thus they might encounter a new experimenter later in the experiment, but were never explicitly informed about each experimenter’s syntactic preferences, or that the pictures could be described using multiple syntactic structures.”
Experiment 4 Specific Procedure
“The general procedure was similar to that of the previous experiments, with a few important differences. As before, each subject interacted with two experimenters across the experiment, and only one experimenter was ever in the room with the subject at a time. Across participants, there were seven people who acted as experimenters, six female and one male; different participants were tested by different pairs of experimenters, assigned non-systematically based on availability.
One goal for Experiment 4 was to ensure that the experimental manipulation was sufficiently powerful to detect partner-specific syntactic alignment should such alignment exist. This was addressed with four methodological changes from the previous experiments. First, to increase the amount of exposure to each experimenter’s syntactic preferences, transitive and locative pictures were removed so that all critical trials involved dative pictures, thus increasing the number of sentences of one alternation (PD vs. DO) that participants heard from the experimenters. Second, for the same reason, all of the rounds in which the experimenter described her pictures preceded all of the rounds in which the participant described his pictures. This maximized the amount of syntactic exposure that participants received from each experimenter before describing their own pictures. As a result of these two design changes, participants heard 24 sentences of a given structure (PD or DO) before they described any pictures themselves.
Third, to verify that participants were aware that they were interacting with two distinct experimenters and could remember specific statements that were said by each person, each round began with the experimenter telling the participant a fictional but plausible fact about herself – for example, that the experimenter grew up in New York City (an uncommon occurrence among students attending a public university in California). Participants were told at the beginning of the experiment that the facts would “come up later”, and were tested on which experimenter had said each fact after the main experiment.
The fourth departure from previous experiments was that each participant was randomly assigned to one of four between-participant syntax exposure groups. As before, each experimenter described a total of 24 pictures across four rounds to the participant. In the all-PD exposure condition, both experimenters described all of their dative events using PDs. Thus participants in this condition heard a total of 24 PDs, 0 DOs, and 24 intransitives (12 PDs and 12 intransitives from each experimenter). The all-DO exposure condition was the reverse: both experimenters described all of their dative events using DOs. Thus, in this condition, participants heard a total of 0 PDs, 24 DOs, and 24 intransitives (12 DOs and 12 intransitives from each experimenter). Comparing the relative rate of participants’ PD and DO production in these two conditions will permit testing for global syntactic alignment. If hearing 24 sentences of a given alternation is sufficient to affect participants’ syntactic production, then participants in the all-PD condition should produce more PDs than do participants in the all-DO condition (following Kaschak, 2007). In the mixed condition, both experimenters produced both structures at the same rate, each describing half of her dative events using PDs and half using DOs. Thus, participants heard a total of 24 PDs, 24 DOs, and 0 intransitives (12 PDs and 12 DOs from each experimenter). In the critical, speaker-specific exposure condition, participants heard the same overall number of sentences of each structure as in the mixed condition (24 PDs, 24 DOs, and 0 intransitives). However, each experimenter described all of her pictures using only her preferred structure. Thus, participants in this condition heard all 24 DOs from Experimenter A and all 24 PDs from Experimenter B. Across the four syntax exposure conditions, all participants received the same amount of exposure to a particular structure – 24 sentences. If hearing 24 sentences of one structure is sufficient to affect syntactic production in one condition, it should be for the others as well. See Table 3 for a summary of the exposure conditions.”
Experiment 5 Specific Procedure
“The experimental design and procedure was identical to that of Experiment 4 except that one experimenter was a non-native English speaker with a heavy Mandarin accent. She began learning English at age 9 and did not live in an English-speaking country (USA) until age 18. The second experimenter was a native (unaccented) English speaker, as in all previous experiments. There were three people who acted as the native experimenter across participants, two female and one male; different participants were tested by a different native experimenter (assigned non-systematically based on availability) paired with the same non-native experimenter. In the speaker-specific condition, syntactic preference of the native and non-native experimenter was counterbalanced across participants.”
Analysis Plan
“The included sentence productions [will be] submitted to a 4 (Syntax Exposure Condition: all-PD, speaker-specific, mixed, all-DO)× 2 (Listener: Experimenter A, Experimenter B) GLMM.”
There will also be single structure comparisons for all-PD vs. all-DO conditions, speaker-specific vs. mixed conditions for each experiment four and experiment five. Each comparison will have a visualization.
This will attempt to confirm the general findings of the study- that syntactic alignment does happen in conversation, but it is not specific to the person that the participant is speaking to. In a conversation with two speakers, one who uses more DO structures and one that does not, participants should use more DO structures unrelated to which experimenter they are speaking to.
Differences from Original Study
As opposed to the original study, this will contain visualizations for each analysis done on experiments 4 and 5. Everything else should be largely the same as the original, considering it is a reproduction.
Reproducibility Pipeline
Data is sourced from the OSF collection of the project. It is already transcribed and cleaned. Participant responses are marked as either fitting the expected sentence structure or not fitting the expected sentence structure. Data needs to be entered into a generalized linear mixed model (GLMM) which will be done using lme4 package’s glmer() function. It will be a binomial model. The original author of the paper noted that the preparation that she did before the experiment ended up being largely unnecessary, so that will not be done. I will then be using ggplot2 to create visualizations of the data. I hope to, at the end of this reproduction, be able to have visualizations similar to the original paper’s and be able to confirm their results.
Actual Sample
The sample size for each experiment is 96 UCSD undergraduates.
Differences from pre-data collection methods plan
None
Data preparation
Data preparation following the analysis plan.
# Load packageslibrary(lme4) # For mixed-effects models
Loading required package: Matrix
library(car) # For sum contrasts, if needed
Loading required package: carData
library(dplyr) # For data manipulation
Attaching package: 'dplyr'
The following object is masked from 'package:car':
recode
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2) # For Plottinglibrary(ggthemes) # GGplot themeslibrary(emmeans) # Single structure comparison
Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
library(papaja) #tables
Loading required package: tinylabels
library(tinylabels) #tables
# Load in dataexpFour <-read.csv("/Users/averyschwartz/Desktop/Stats/Final Project/Exp4_ClearTrain.csv")expFive <-read.csv("/Users/averyschwartz/Desktop/Stats/Final Project/Exp5_TrainInSpain.csv")
# Factorialize exp four dataexpFour$Listener <-factor(expFour$Listener, levels =c("E1", "E2"))expFour$SyntaxExposure <-factor(expFour$SyntaxExposure, levels =c("All-PD", "All-DO", "Mixed", "Speaker-specific"))# Ensure DV is binaryexpFour$Target_PD <-as.numeric(expFour$Target_PD)# For random effectsexpFour$Subject <-factor(expFour$Subject)expFour$Picture <-factor(expFour$Picture)
# Factorialize exp five dataexpFive$Listener_nativeness <-factor(expFive$Listener_nativeness, levels =c("native", "non-native"))expFive$SyntaxExposure <-factor(expFive$SyntaxExposure, levels =c("All-PD", "All-DO", "Mixed", "Speaker-specific"))# Ensure DV is binaryexpFive$Target_PD <-as.numeric(expFive$Target_PD)# For random effectsexpFive$Subject <-factor(expFive$Subject)expFive$Picture <-factor(expFive$Picture)
Confirmatory analysis
The analyses as specified in the analysis plan.
#Run the actual model for fourmodelFour <-glmer(Target_PD ~ Listener * SyntaxExposure + (1+ Listener * SyntaxExposure | Subject) + (1| Picture), data = expFour,family =binomial(link ="logit"),control =glmerControl(optimizer ="bobyqa", optCtrl =list(maxfun =50000)))
#Graph for modelFourggplot(expFour, aes(x =reorder(SyntaxExposure,desc(Target_PD)), y = Target_PD, fill = Listener)) +stat_summary(fun = mean, geom ="bar", position =position_dodge(), color ="black") +#bars w mean scale_y_continuous(limits =c(0, 1), expand =c(0, 0)) +stat_summary(fun.data = mean_se, geom ="errorbar",position =position_dodge(width =0.9), width =0.2) +#error bars scale_fill_manual(values =c("gray20", "lightgrey"),labels =c("Listener 1", "Listener 2") #labels ) +labs(title ="Proportion of PD Produced by Syntax Exposure and Listener",x ="Syntax Exposure",y ="Proportion of PD Produced",fill ="Listener") +theme_linedraw()
# Graph for single structure exp4# PD and DO ggplot(expFour, aes(x = SyntaxExposure,y = Target_PD, fill = Listener)) +stat_summary(fun ="mean", geom ="bar", position =position_dodge(width = .9), color ="black", size =0.7,na.rm =TRUE) +# Bar with mean valuesstat_summary(fun.data ="mean_se", geom ="errorbar", position =position_dodge(width = .9), width =0.2,na.rm =TRUE) +# Error bars with standard errorscale_fill_manual(values =c("gray20", "lightgrey"),labels =c("Listener 1", "Listener 2")) +# Custom labelslabs(x ="Syntax Exposure",y ="Proportion of PDs Produced",title ="Comparison of PDs Produced between All-PD and All-DO Conditions" ) +scale_y_continuous(limits =c(0, 1), expand =c(0, 0)) +scale_x_discrete(limits =c("All-PD", "All-DO")) +# Focus on All-PD vs All-DOtheme_linedraw()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
#Mixed and Speaker specfic ggplot(expFour, aes(x = SyntaxExposure,y = Target_PD, fill = Listener)) +stat_summary(fun ="mean", geom ="bar", position =position_dodge(width = .9), color ="black", size =0.7,na.rm =TRUE) +# Bar with mean valuesstat_summary(fun.data ="mean_se", geom ="errorbar", position =position_dodge(width = .9), width =0.2,na.rm =TRUE) +# Error bars with standard errorscale_fill_manual(values =c("gray20", "lightgrey"),labels =c("Listener 1", "Listener 2")) +# Custom labelslabs(x ="Syntax Exposure",y ="Proportion of PDs Produced",title ="Comparison of PDs Produced between All-PD and All-DO Conditions" ) +scale_y_continuous(limits =c(0, 1), expand =c(0, 0)) +scale_x_discrete(limits =c("Mixed", "Speaker-specific")) +# Focus on All-PD vs All-DOtheme_linedraw()
#table for modelFivetab_model( modelFive,show.ci =FALSE,show.re.var =TRUE,show.icc =TRUE,dv.labels ="Target PD",title ="Effect of Listener and Syntax Exposure on Target PD",wrap.labels =60, linebreak =TRUE, digits =3)
boundary (singular) fit: see help('isSingular')
Effect of Listener and Syntax Exposure on Target PD
ggplot(expFive, aes(x =reorder(SyntaxExposure,desc(Target_PD)), y = Target_PD, fill = Listener_nativeness)) +stat_summary(fun = mean, geom ="bar", position =position_dodge(), color ="black") +#bar w meanscale_y_continuous(limits =c(0, 1), expand =c(0, 0)) +stat_summary(fun.data = mean_se, geom ="errorbar",position =position_dodge(width =0.9), width =0.2) +# error barsscale_fill_manual(values =c("gray20", "lightgrey"),labels =c("Native", "Non-Native") # Custom labels ) +labs(title ="Proportion of PD Produced by Syntax Exposure and Listener Nativeness",x ="Syntax Exposure",y ="Proportion of PD Produced",fill ="Listener") +theme_linedraw()
#graph for single structure exp 5ggplot(expFive, aes(x = SyntaxExposure,y = Target_PD, fill = Listener_nativeness)) +stat_summary(fun ="mean", geom ="bar", position =position_dodge(width = .9), color ="black", size =0.7,na.rm =TRUE) +# Bar w meanstat_summary(fun.data ="mean_se", geom ="errorbar", position =position_dodge(width = .9), width =0.2,na.rm =TRUE) +# Error bars scale_fill_manual(values =c("gray20", "lightgrey"),labels =c("Native", "Non-Native") # labels ) +labs(x ="Syntax Exposure",y ="Proportion of PDs Produced",title ="Comparison of PDs Produced between All-PD and All-DO Conditions" ) +scale_y_continuous(limits =c(0, 1), expand =c(0, 0))+scale_x_discrete(limits =c("All-PD", "All-DO")) +# Focus on All-PD vs All-DOtheme_linedraw()
#Mixed and Speaker specfic ggplot(expFive, aes(x = SyntaxExposure,y = Target_PD, fill = Listener_nativeness)) +stat_summary(fun ="mean", geom ="bar", position =position_dodge(width = .9), color ="black", size =0.7,na.rm =TRUE) +# Bar w meanstat_summary(fun.data ="mean_se", geom ="errorbar", position =position_dodge(width = .9), width =0.2,na.rm =TRUE) +# Error bars scale_fill_manual(values =c("gray20", "lightgrey"),labels =c("Native", "Non-native")) +# labelslabs(x ="Syntax Exposure",y ="Proportion of PDs Produced",title ="Comparison of PDs Produced between Mixed and Speaker-specific" ) +scale_y_continuous(limits =c(0, 1), expand =c(0, 0)) +scale_x_discrete(limits =c("Mixed", "Speaker-specific")) +# Focus on mixed vs speaker-specifictheme_linedraw()
Discussion
Summary of Replication Attempt
First, I observed that alignment occurred as expected: when participants were exposed to more prepositional datives (PDs), they produced more PDs themselves. The mixed and speaker-specific conditions showed less production of PDs than the PD exposure condition, and in the double-object condition, participants produced even fewer PDs. This pattern aligned with my predictions.
The critical result for the research question lies in the comparison between listener 1 and listener 2. Here, I found no significant difference in participant structure construction based on which listener they were speaking to. Importantly, after running pairwise comparisons to examine the difference between the speaker-specific and mixed conditions, there was also no significant difference. This is all true for both experiment four and five.
These results indicate that participants did not respond differently to individual experimenters. Therefore, we can conclude that syntactic alignment is not partner-specific but partner-general. This directly replicates the findings of the original paper, making this a successful reproduction.
Commentary
The most notable difference between Ostrand (2019) and my reproduction lies in how the models were analyzed. The original paper compared the results of the full GLMM to a reduced model and reported those comparisons. However, the paper provided very limited information about the reduced model, making this aspect difficult to reproduce. Additionally, the original author mentioned that nested effects had to be removed iteratively to achieve model convergence. In my case, I started with fewer random effects from the outset, avoiding the need for iterative adjustments.
Despite these slight differences, the finding of the original paper were successfully reproduced.