Replication of Study Rapid Word Learning Under Uncertainty via Cross-Situational Statistics by Yu & Smith (2007, Psychological Science)

Author

Alison Park, Junyi Hui, Pengjia Cui, and Yawen Dong

Published

November 22, 2024

Introduction

The study Rapid Word Learning Under Uncertainty via Cross-Situational Statistics by Yu and Smith explored how adults can learn word-referent pairs under highly ambiguous settings. Past studies on word learning have been focusing on constraints such as social, attentional, or linguistic cues to solve the word-referent mapping problem. While these strategies performed well in controlled, minimally ambiguous contexts, real-world learning environments presented learners with greater complexity.

This raises an important question: can learners successfully acquire word-referent pairs in highly ambiguous settings through alternative means, even when they cannot determine correct pairings within a single trial? To address the question, Yu and Smith propose an alternative mechanism—— cross-situational learning —— in this study. They demonstrated that learners could track word-referent pairings across multiple trials by calculating statistical associations over time rather than relying on immediate clarity within each learning instance.

Design Overview

One factor was manipulated in the study: within-trial ambiguity. The manipulation operates through three conditions in which the number of words and referents presented per trial varied (2×2, 3×3, and 4×4).
Two measures were taken: accuracy in learning word-referent pairs and response time.
The study employed a within-participants design as each participant experienced all three conditions.
Measures were repeated across each condition for every participant.
Applying a between-participants design instead of a within-participants design would increase variance due to individual learning differences.
The study reduced demand characteristics by using pseudowords and not providing explicit cues linking words to specific referents, thus participants had to rely solely on cross-trial statistical learning.
A potential confound is the repetitive exposure to pseudowords and objects, which could lead participants to develop their own strategies which are not based on cross-trial statistical learning but rather on familiarity or memorization.
The use of pseudowords and uncommon objects may limit generalizability to real-world language learning, where learners often have social and contextual cues available. Also, testing was limited to adult participants, so findings may not generalize well to children.

Methods

Power Analysis

library(pwr)

effect_size<-1.425
alpha<-0.05
result <- pwr.t.test(d = effect_size, n = 38, sig.level = alpha,alternative="greater")
print(result)


     Two-sample t test power calculation 

              n = 38
              d = 1.425
      sig.level = 0.05
          power = 0.9999967
    alternative = greater

NOTE: n is number in *each* group

With the data given in the original study, we found that with 38 participants per group, we achieved a very high statistical power This indicates that the probability of correctly rejecting the null hypothesis, if the alternative hypothesis is true, is nearly 100%.

Planned Sample

Thirty-eight participants were recruited for the original study, and they receive either course credit or $7. Our replication will aim to include a similar or slightly larger sample size with recruitment from Prolific to maintain consistency with the original design.

Materials

“The stimuli were slides containing pictures of uncommon objects (e.g., canister, facial sauna, and rasp) paired with auditorily presented pseudowords. These artificial words were generated by a computer program to sample English forms that were broadly phonotactically probable; they were produced by a synthetic female voice in monotone. There were 54 unique objects and 54 unique pseudowords partitioned into three sets of 18 words and referents for use in the three conditions. The training trials were generated by randomly pairing each word with one picture; these were the word-referent pairs to be discovered by the learner. The three learning conditions differed in the number of words and referents presented on each training trial: 2-2 Condition: 2 words and 2 pictures; 3-3 Condition: 3 words and 3 pictures; 4-4 Condition: 4 words and 4 pictures” (Yu and Smith 2007)

Procedure

“The pictures were presented on a 17-in. computer screen, and the sound was played by the speakers connected to the same computer. Subjects were instructed that their task was to learn the words and referents, but they were not told that there was one referent per word. They were told that multiple words and pictures would co-occur on each trial and that their task was to figure out across trials which word went with which picture. After training in each condition, subjects received a fouralternative forced-choice test of learning. On the test, they were presented with 1 word and 4 pictures and asked to indicate the picture named by that word. The target picture and the 3 foils were all drawn from the set of 18 training pictures.” (Yu and Smith 2007)

Analysis Plan

The primary analysis will involves a one-way ANOVA to compare learning accuracy across the three conditions (2×2, 3×3, and 4×4). In this setup, the independent variable is the condition (level of ambiguity), and the dependent variable is the accuracy of word-object pair identification.

We will also examine response times across conditions to investigate whether higher ambiguity affects the speed of learning, which may contribute to understanding cognitive processing under different conditions

Data cleaning will include the exclusion of trials where response times are excessively high or low to account for inattentiveness or random guessing. Also, participants performing below chance level overall will be excluded from the analysis, as this suggests they may not have engaged meaningfully with the task.

Differences from Original Study

Sample: The original study included 38 undergraduate participants from Indiana University. Our sample may differ slightly due to recruitment constraints; participants will probably being drawn from a broader demographic pool, which could introduce variability in learning abilities or prior exposure to similar experimental tasks. However, as cross-situational learning mechanisms are believed to be consistent across adult populations, the sample difference is not supposed to significantly impact the findings.
Setting: In the original study, participants completed the trials in a controlled lab environment. Our replication may only involve online settings. Conducting the experiment outside of a laboratory could introduce additional distractions or variations. As the original research suggests that cross-situational learning effects are resilient to minor environmental changes, we do not expect this variation to significantly influence the outcome.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data from the experiment will be imported in a format compatible with R. The dataset will include participant IDs, condition labels (2×2, 3×3, 4×4), trial responses, accuracy scores, and response times for each trial.
As noted above, trials with response times significantly above or below the mean and participants who perform below chance level across conditions will be excluded
After filtering the data, we will calculate mean accuracy for each participant across conditions (2×2, 3×3, 4×4) and create a new column for analysis.

Data Preparation

Load Relevant Libraries and Functions

library(jsonlite)

Warning: package 'jsonlite' was built under R version 4.2.3

library(dplyr)

Warning: package 'dplyr' was built under R version 4.2.3


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)

Warning: package 'ggplot2' was built under R version 4.2.3

library(pwr)
library(effectsize)
library(car)

Loading required package: carData


Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

library(tidyr)

Import data

setwd("/Users/yawendong/Documents/GitHub/psych final project/pilotB_data/Condition1")
files1 <- list.files(pattern = "\\.csv$")
data1 <- lapply(files1, read.csv) %>% bind_rows()

setwd("/Users/yawendong/Documents/GitHub/psych final project/pilotB_data/Condition2")
files2 <- list.files(pattern = "\\.csv$")
data2 <- lapply(files2, read.csv) %>% bind_rows()

setwd("/Users/yawendong/Documents/GitHub/psych final project/pilotB_data/Condition3")
files3 <- list.files(pattern = "\\.csv$")
data3 <- lapply(files3, read.csv) %>% bind_rows()

Data exclusion / filtering

selected_data1 <- data1 %>% select(correct_choice, correct_image, response_letter, correct, response_time)
cleaned_data1 <- na.omit(selected_data1)
cleaned_data1$correct <- as.logical(cleaned_data1$correct)

selected_data2 <- data2 %>% select(correct_choice, correct_image, response_letter, correct, response_time)
cleaned_data2 <- na.omit(selected_data2)
cleaned_data2$correct <- as.logical(cleaned_data2$correct)

selected_data3 <- data3 %>% select(correct_choice, correct_image, response_letter, correct, response_time)
cleaned_data3 <- na.omit(selected_data3)
cleaned_data3$correct <- as.logical(cleaned_data3$correct)

Prepare data for analysis - create columns etc.

cleaned_data1$Condition <- 'Condition1'
cleaned_data2$Condition <- 'Condition2'
cleaned_data3$Condition <- 'Condition3'

combined_data <- bind_rows(cleaned_data1, cleaned_data2, cleaned_data3)

Confirmatory analysis

We employed 5 participants to complete 2 * 2 (Condition 1), 3 * 3 (Condition 2), and 4 * 4 (Condition 3) conditions in Pilot Test B. The participants spent 22 min on average to finish all three conditions, which is close to what we expected.

Accuracy

Overall Accuracy

accuracy <- combined_data %>%
  group_by(Condition) %>%
  summarise(Accuracy = mean(correct), .groups = 'drop')

print(accuracy)

# A tibble: 3 × 2
  Condition  Accuracy
  <chr>         <dbl>
1 Condition1    0.733
2 Condition2    0.722
3 Condition3    0.556

Accuracy was highest in Condition1 at 73.3%, slightly lower in Condition2 at 72.2%, and significantly lower in Condition3 at 55.6%. The data indicates that increased ambiguity reduces accuracy.

Accuracy over images

accuracy_by_image <- combined_data %>%
  group_by(Condition, correct_image) %>%
  summarise(Accuracy = mean(correct), .groups = 'drop')

ggplot(accuracy_by_image, aes(x = correct_image, y = Accuracy, group = Condition, color = Condition)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(
    title = "Accuracy by Image for Each Condition",
    x = "Image",
    y = "Accuracy"
  ) +
  scale_x_continuous(breaks = unique(accuracy_by_image$correct_image)) +
  facet_wrap(~Condition, scales = "free_x", ncol = 1)+
  theme_minimal()

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

The plot shows that the accuracy is relatively stable across images in each condition. The data in Condition 3 is the least stable, with accuracy showing large fluctuations and frequent dips below 0.5, indicating that the increased ambiguity leads to inconsistent performance across images.

Reaction Times

Overall reaction times

reaction_time <- combined_data %>%
  group_by(Condition) %>%
  summarise(
    Mean_ReactionTime = mean(response_time),
    SD_ReactionTime = sd(response_time),
    .groups = 'drop'
  )

print(reaction_time)

# A tibble: 3 × 3
  Condition  Mean_ReactionTime SD_ReactionTime
  <chr>                  <dbl>           <dbl>
1 Condition1             4014.           7403.
2 Condition2             2767.           1561.
3 Condition3             3042.           1409.

Average reaction times are highest in Condition1 (4014.09 ms), and are shortest in Condition2. Also, reaction times in Condition 1 show significantly higher variability than the other two conditions.

Reaction times over images

reaction_time_by_image <- combined_data %>%
  group_by(Condition, correct_image) %>%
  summarise(
    Mean_ReactionTime = mean(response_time, na.rm = TRUE),
    .groups = 'drop'
  )

ggplot(reaction_time_by_image, aes(x = correct_image, 
                                   y = Mean_ReactionTime, 
                                   group = Condition, 
                                   color = Condition)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(
    title = "Reaction Times by Image for Each Condition",
    x = "Image",
    y = "Mean Reaction Time (ms)"
  ) +
  scale_x_continuous(breaks = unique(reaction_time_by_image$correct_image)) +
  facet_wrap(~Condition, scales = "free_x", ncol = 1) +
  theme_minimal()

Reaction times in condition 1 are generally stable across images, except for a sharp spike at Image 8. Due to our small sample size, this might suggest participants encountered a specific difficulty on that image. Reaction times are relatively consistent across all images in Condition 2 and 3.

Chance Performance

# The expected performance by chance for 2*2, 3*3, and 4*4 Condition is 1/4, 1/9, and 1/16
combined_data <- combined_data %>%
  mutate(chance_level = case_when(
    Condition == "Condition1" ~ 0.25,
    Condition == "Condition2" ~ 0.1111,
    Condition == "Condition3" ~ 0.0625
  ))

t_test_results <- combined_data %>%
  group_by(Condition) %>%
  summarise(
    t_test_p_value = t.test(correct, mu = unique(chance_level))$p.value,
    .groups = 'drop'
  )

print(t_test_results)

# A tibble: 3 × 2
  Condition  t_test_p_value
  <chr>               <dbl>
1 Condition1       7.31e-17
2 Condition2       4.99e-22
3 Condition3       6.72e-15

The t-tests validate that participants are employing cross-situational learning to perform better than random guessing in all conditions, aligning with findings in the original study.

Effect of Condition (ANOVA)

anova <- aov(correct ~ Condition, data = combined_data)
summary(anova)

             Df Sum Sq Mean Sq F value Pr(>F)  
Condition     2   1.79  0.8926   4.118 0.0173 *
Residuals   267  57.88  0.2168                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value indicates that condition has a measurable effect on accuracy, meaning the level of ambiguity in word-referent pairings impacts participants’ performance.

Post-Hoc Analysis

tukey <- TukeyHSD(anova)
print(tukey)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = correct ~ Condition, data = combined_data)

$Condition
                             diff        lwr          upr     p adj
Condition2-Condition1 -0.01111111 -0.1746902  0.152467963 0.9859710
Condition3-Condition1 -0.17777778 -0.3413569 -0.014198703 0.0294468
Condition3-Condition2 -0.16666667 -0.3302457 -0.003087592 0.0447158

The analysis shows that there is no statistically significant difference between Condition1 and Condition2, while Condition3 shows lower accuracy than the other two conditions.

Additional Links

Stimuli: https://github.com/ucsd-psych201a/yu2007/tree/main/stimuli
Code: https://ucsd-psych201a.github.io/yu2007/final_coding_11.22.html

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.