library(pwr)
Replication of Study Rapid Word Learning Under Uncertainty via Cross-Situational Statistics by Yu & Smith (2007, Psychological Science)
Introduction
The study Rapid Word Learning Under Uncertainty via Cross-Situational Statistics by Yu and Smith explored how adults can learn word-referent pairs under highly ambiguous settings. Past studies on word learning have been focusing on constraints such as social, attentional, or linguistic cues to solve the word-referent mapping problem. While these strategies performed well in controlled, minimally ambiguous contexts, real-world learning environments presented learners with greater complexity.
This raises an important question: can learners successfully acquire word-referent pairs in highly ambiguous settings through alternative means, even when they cannot determine correct pairings within a single trial? To address the question, Yu and Smith propose an alternative mechanism—— cross-situational learning —— in this study. They demonstrated that learners could track word-referent pairings across multiple trials by calculating statistical associations over time rather than relying on immediate clarity within each learning instance.
Design Overview
- One factor was manipulated in the study: within-trial ambiguity. The manipulation operates through three conditions in which the number of words and referents presented per trial varied (2×2, 3×3, and 4×4).
- Two measures were taken: accuracy in learning word-referent pairs and response time.
- The study employed a within-participants design as each participant experienced all three conditions.
- Measures were repeated across each condition for every participant.
- Applying a between-participants design instead of a within-participants design would increase variance due to individual learning differences.
- The study reduced demand characteristics by using pseudowords and not providing explicit cues linking words to specific referents, thus participants had to rely solely on cross-trial statistical learning.
- A potential confound is the repetitive exposure to pseudowords and objects, which could lead participants to develop their own strategies which are not based on cross-trial statistical learning but rather on familiarity or memorization.
- The use of pseudowords and uncommon objects may limit generalizability to real-world language learning, where learners often have social and contextual cues available. Also, testing was limited to adult participants, so findings may not generalize well to children.
Methods
Power Analysis
<-1.425
effect_size<-0.05
alpha<- pwr.t.test(d = effect_size, n = 38, sig.level = alpha,alternative="greater")
result print(result)
Two-sample t test power calculation
n = 38
d = 1.425
sig.level = 0.05
power = 0.9999967
alternative = greater
NOTE: n is number in *each* group
With the data given in the original study, we found that with 38 participants per group, we achieved a very high statistical power This indicates that the probability of correctly rejecting the null hypothesis, if the alternative hypothesis is true, is nearly 100%.
Planned Sample
Thirty-eight participants were recruited for the original study, and they receive either course credit or $7. Our replication will aim to include a similar or slightly larger sample size with recruitment from Prolific to maintain consistency with the original design.
Materials
“The stimuli were slides containing pictures of uncommon objects (e.g., canister, facial sauna, and rasp) paired with auditorily presented pseudowords. These artificial words were generated by a computer program to sample English forms that were broadly phonotactically probable; they were produced by a synthetic female voice in monotone. There were 54 unique objects and 54 unique pseudowords partitioned into three sets of 18 words and referents for use in the three conditions. The training trials were generated by randomly pairing each word with one picture; these were the word-referent pairs to be discovered by the learner. The three learning conditions differed in the number of words and referents presented on each training trial: 2-2 Condition: 2 words and 2 pictures; 3-3 Condition: 3 words and 3 pictures; 4-4 Condition: 4 words and 4 pictures” (Yu and Smith 2007)
Procedure
“The pictures were presented on a 17-in. computer screen, and the sound was played by the speakers connected to the same computer. Subjects were instructed that their task was to learn the words and referents, but they were not told that there was one referent per word. They were told that multiple words and pictures would co-occur on each trial and that their task was to figure out across trials which word went with which picture. After training in each condition, subjects received a fouralternative forced-choice test of learning. On the test, they were presented with 1 word and 4 pictures and asked to indicate the picture named by that word. The target picture and the 3 foils were all drawn from the set of 18 training pictures.” (Yu and Smith 2007)
Analysis Plan
The primary analysis will involves a one-way ANOVA to compare learning accuracy across the three conditions (2×2, 3×3, and 4×4). In this setup, the independent variable is the condition (level of ambiguity), and the dependent variable is the accuracy of word-object pair identification.
We will also examine response times across conditions to investigate whether higher ambiguity affects the speed of learning, which may contribute to understanding cognitive processing under different conditions
Data cleaning will include the exclusion of trials where response times are excessively high or low to account for inattentiveness or random guessing. Also, participants performing below chance level overall will be excluded from the analysis, as this suggests they may not have engaged meaningfully with the task.
Differences from Original Study
Sample: The original study included 38 undergraduate participants from Indiana University. Our sample may differ slightly due to recruitment constraints; participants will probably being drawn from a broader demographic pool, which could introduce variability in learning abilities or prior exposure to similar experimental tasks. However, as cross-situational learning mechanisms are believed to be consistent across adult populations, the sample difference is not supposed to significantly impact the findings.
Setting: In the original study, participants completed the trials in a controlled lab environment. Our replication may only involve online settings. Conducting the experiment outside of a laboratory could introduce additional distractions or variations. As the original research suggests that cross-situational learning effects are resilient to minor environmental changes, we do not expect this variation to significantly influence the outcome.
Methods Addendum (Post Data Collection)
You can comment this section out prior to final report with data collection.
Actual Sample
Sample size, demographics, data exclusions based on rules spelled out in analysis plan
Differences from pre-data collection methods plan
Any differences from what was described as the original plan, or “none”.
Results
Data preparation
- Data from the experiment will be imported in a format compatible with R. The dataset will include participant IDs, condition labels (2×2, 3×3, 4×4), trial responses, accuracy scores, and response times for each trial.
- As noted above, trials with response times significantly above or below the mean and participants who perform below chance level across conditions will be excluded
- After filtering the data, we will calculate mean accuracy for each participant across conditions (2×2, 3×3, 4×4) and create a new column for analysis.
Data Preparation
Load Relevant Libraries and Functions
library(jsonlite)
Warning: package 'jsonlite' was built under R version 4.2.3
library(dplyr)
Warning: package 'dplyr' was built under R version 4.2.3
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.2.3
library(pwr)
library(effectsize)
library(car)
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
library(tidyr)
Import data
setwd("/Users/yawendong/Documents/GitHub/psych final project/pilotB_data/Condition1")
<- list.files(pattern = "\\.csv$")
files1 <- lapply(files1, read.csv) %>% bind_rows()
data1
setwd("/Users/yawendong/Documents/GitHub/psych final project/pilotB_data/Condition2")
<- list.files(pattern = "\\.csv$")
files2 <- lapply(files2, read.csv) %>% bind_rows()
data2
setwd("/Users/yawendong/Documents/GitHub/psych final project/pilotB_data/Condition3")
<- list.files(pattern = "\\.csv$")
files3 <- lapply(files3, read.csv) %>% bind_rows() data3
Data exclusion / filtering
<- data1 %>% select(correct_choice, correct_image, response_letter, correct, response_time)
selected_data1 <- na.omit(selected_data1)
cleaned_data1 $correct <- as.logical(cleaned_data1$correct)
cleaned_data1
<- data2 %>% select(correct_choice, correct_image, response_letter, correct, response_time)
selected_data2 <- na.omit(selected_data2)
cleaned_data2 $correct <- as.logical(cleaned_data2$correct)
cleaned_data2
<- data3 %>% select(correct_choice, correct_image, response_letter, correct, response_time)
selected_data3 <- na.omit(selected_data3)
cleaned_data3 $correct <- as.logical(cleaned_data3$correct) cleaned_data3
Prepare data for analysis - create columns etc.
$Condition <- 'Condition1'
cleaned_data1$Condition <- 'Condition2'
cleaned_data2$Condition <- 'Condition3'
cleaned_data3
<- bind_rows(cleaned_data1, cleaned_data2, cleaned_data3) combined_data
Confirmatory analysis
We employed 5 participants to complete 2 * 2 (Condition 1), 3 * 3 (Condition 2), and 4 * 4 (Condition 3) conditions in Pilot Test B. The participants spent 22 min on average to finish all three conditions, which is close to what we expected.
Accuracy
Overall Accuracy
<- combined_data %>%
accuracy group_by(Condition) %>%
summarise(Accuracy = mean(correct), .groups = 'drop')
print(accuracy)
# A tibble: 3 × 2
Condition Accuracy
<chr> <dbl>
1 Condition1 0.733
2 Condition2 0.722
3 Condition3 0.556
Accuracy was highest in Condition1 at 73.3%, slightly lower in Condition2 at 72.2%, and significantly lower in Condition3 at 55.6%. The data indicates that increased ambiguity reduces accuracy.
Accuracy over images
<- combined_data %>%
accuracy_by_image group_by(Condition, correct_image) %>%
summarise(Accuracy = mean(correct), .groups = 'drop')
ggplot(accuracy_by_image, aes(x = correct_image, y = Accuracy, group = Condition, color = Condition)) +
geom_line(size = 1) +
geom_point(size = 2) +
labs(
title = "Accuracy by Image for Each Condition",
x = "Image",
y = "Accuracy"
+
) scale_x_continuous(breaks = unique(accuracy_by_image$correct_image)) +
facet_wrap(~Condition, scales = "free_x", ncol = 1)+
theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
The plot shows that the accuracy is relatively stable across images in each condition. The data in Condition 3 is the least stable, with accuracy showing large fluctuations and frequent dips below 0.5, indicating that the increased ambiguity leads to inconsistent performance across images.
Reaction Times
Overall reaction times
<- combined_data %>%
reaction_time group_by(Condition) %>%
summarise(
Mean_ReactionTime = mean(response_time),
SD_ReactionTime = sd(response_time),
.groups = 'drop'
)
print(reaction_time)
# A tibble: 3 × 3
Condition Mean_ReactionTime SD_ReactionTime
<chr> <dbl> <dbl>
1 Condition1 4014. 7403.
2 Condition2 2767. 1561.
3 Condition3 3042. 1409.
Average reaction times are highest in Condition1 (4014.09 ms), and are shortest in Condition2. Also, reaction times in Condition 1 show significantly higher variability than the other two conditions.
Reaction times over images
<- combined_data %>%
reaction_time_by_image group_by(Condition, correct_image) %>%
summarise(
Mean_ReactionTime = mean(response_time, na.rm = TRUE),
.groups = 'drop'
)
ggplot(reaction_time_by_image, aes(x = correct_image,
y = Mean_ReactionTime,
group = Condition,
color = Condition)) +
geom_line(size = 1) +
geom_point(size = 2) +
labs(
title = "Reaction Times by Image for Each Condition",
x = "Image",
y = "Mean Reaction Time (ms)"
+
) scale_x_continuous(breaks = unique(reaction_time_by_image$correct_image)) +
facet_wrap(~Condition, scales = "free_x", ncol = 1) +
theme_minimal()
Reaction times in condition 1 are generally stable across images, except for a sharp spike at Image 8. Due to our small sample size, this might suggest participants encountered a specific difficulty on that image. Reaction times are relatively consistent across all images in Condition 2 and 3.
Chance Performance
# The expected performance by chance for 2*2, 3*3, and 4*4 Condition is 1/4, 1/9, and 1/16
<- combined_data %>%
combined_data mutate(chance_level = case_when(
== "Condition1" ~ 0.25,
Condition == "Condition2" ~ 0.1111,
Condition == "Condition3" ~ 0.0625
Condition
))
<- combined_data %>%
t_test_results group_by(Condition) %>%
summarise(
t_test_p_value = t.test(correct, mu = unique(chance_level))$p.value,
.groups = 'drop'
)
print(t_test_results)
# A tibble: 3 × 2
Condition t_test_p_value
<chr> <dbl>
1 Condition1 7.31e-17
2 Condition2 4.99e-22
3 Condition3 6.72e-15
The t-tests validate that participants are employing cross-situational learning to perform better than random guessing in all conditions, aligning with findings in the original study.
Effect of Condition (ANOVA)
<- aov(correct ~ Condition, data = combined_data)
anova summary(anova)
Df Sum Sq Mean Sq F value Pr(>F)
Condition 2 1.79 0.8926 4.118 0.0173 *
Residuals 267 57.88 0.2168
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value indicates that condition has a measurable effect on accuracy, meaning the level of ambiguity in word-referent pairings impacts participants’ performance.
Post-Hoc Analysis
<- TukeyHSD(anova)
tukey print(tukey)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = correct ~ Condition, data = combined_data)
$Condition
diff lwr upr p adj
Condition2-Condition1 -0.01111111 -0.1746902 0.152467963 0.9859710
Condition3-Condition1 -0.17777778 -0.3413569 -0.014198703 0.0294468
Condition3-Condition2 -0.16666667 -0.3302457 -0.003087592 0.0447158
The analysis shows that there is no statistically significant difference between Condition1 and Condition2, while Condition3 shows lower accuracy than the other two conditions.
Additional Links
- Stimuli: https://github.com/ucsd-psych201a/yu2007/tree/main/stimuli
- Code: https://ucsd-psych201a.github.io/yu2007/final_coding_11.22.html
Exploratory analyses
Any follow-up analyses desired (not required).
Discussion
Summary of Replication Attempt
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.