Replication of Study Rapid Word Learning Under Uncertainty via Cross-Situational Statistics by Yu & Smith (2007, Psychological Science)

Author

Alison Park, Junyi Hui, Pengjia Cui, and Yawen Dong

Published

November 11, 2024

Introduction

The study Rapid Word Learning Under Uncertainty via Cross-Situational Statistics by Yu and Smith explored how adults can learn word-referent pairs under highly ambiguous settings. Past studies on word learning have been focusing on constraints such as social, attentional, or linguistic cues to solve the word-referent mapping problem. While these strategies performed well in controlled, minimally ambiguous contexts, real-world learning environments presented learners with greater complexity.

This raises an important question: can learners successfully acquire word-referent pairs in highly ambiguous settings through alternative means, even when they cannot determine correct pairings within a single trial? To address the question, Yu and Smith propose an alternative mechanism—— cross-situational learning —— in this study. They demonstrated that learners could track word-referent pairings across multiple trials by calculating statistical associations over time rather than relying on immediate clarity within each learning instance.

Design Overview

One factor was manipulated in the study: within-trial ambiguity. The manipulation operates through three conditions in which the number of words and referents presented per trial varied (2×2, 3×3, and 4×4).
Two measures were taken: accuracy in learning word-referent pairs and response time.
The study employed a within-participants design as each participant experienced all three conditions.
Measures were repeated across each condition for every participant.
Applying a between-participants design instead of a within-participants design would increase variance due to individual learning differences.
The study reduced demand characteristics by using pseudowords and not providing explicit cues linking words to specific referents, thus participants had to rely solely on cross-trial statistical learning.
A potential confound is the repetitive exposure to pseudowords and objects, which could lead participants to develop their own strategies which are not based on cross-trial statistical learning but rather on familiarity or memorization.
The use of pseudowords and uncommon objects may limit generalizability to real-world language learning, where learners often have social and contextual cues available. Also, testing was limited to adult participants, so findings may not generalize well to children.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

Planned Sample

Thirty-eight participants were recruited for the original study, and they receive either course credit or $7. Our replication will aim to include a similar or slightly larger sample size with recruitment from Prolific to maintain consistency with the original design.

Materials

“The stimuli were slides containing pictures of uncommon objects (e.g., canister, facial sauna, and rasp) paired with auditorily presented pseudowords. These artificial words were generated by a computer program to sample English forms that were broadly phonotactically probable; they were produced by a synthetic female voice in monotone. There were 54 unique objects and 54 unique pseudowords partitioned into three sets of 18 words and referents for use in the three conditions. The training trials were generated by randomly pairing each word with one picture; these were the word-referent pairs to be discovered by the learner. The three learning conditions differed in the number of words and referents presented on each training trial: 2-2 Condition: 2 words and 2 pictures; 3-3 Condition: 3 words and 3 pictures; 4-4 Condition: 4 words and 4 pictures” (Yu and Smith 2007)

Procedure

“The pictures were presented on a 17-in. computer screen, and the sound was played by the speakers connected to the same computer. Subjects were instructed that their task was to learn the words and referents, but they were not told that there was one referent per word. They were told that multiple words and pictures would co-occur on each trial and that their task was to figure out across trials which word went with which picture. After training in each condition, subjects received a fouralternative forced-choice test of learning. On the test, they were presented with 1 word and 4 pictures and asked to indicate the picture named by that word. The target picture and the 3 foils were all drawn from the set of 18 training pictures.” (Yu and Smith 2007)

Analysis Plan

The primary analysis will involves a one-way ANOVA to compare learning accuracy across the three conditions (2×2, 3×3, and 4×4). In this setup, the independent variable is the condition (level of ambiguity), and the dependent variable is the accuracy of word-object pair identification.

We will also examine response times across conditions to investigate whether higher ambiguity affects the speed of learning, which may contribute to understanding cognitive processing under different conditions

Data cleaning will include the exclusion of trials where response times are excessively high or low to account for inattentiveness or random guessing. Also, participants performing below chance level overall will be excluded from the analysis, as this suggests they may not have engaged meaningfully with the task.

Differences from Original Study

Sample: The original study included 38 undergraduate participants from Indiana University. Our sample may differ slightly due to recruitment constraints; participants will probably being drawn from a broader demographic pool, which could introduce variability in learning abilities or prior exposure to similar experimental tasks. However, as cross-situational learning mechanisms are believed to be consistent across adult populations, the sample difference is not supposed to significantly impact the findings.
Setting: In the original study, participants completed the trials in a controlled lab environment. Our replication may only involve online settings. Conducting the experiment outside of a laboratory could introduce additional distractions or variations. As the original research suggests that cross-situational learning effects are resilient to minor environmental changes, we do not expect this variation to significantly influence the outcome.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data from the experiment will be imported in a format compatible with R. The dataset will include participant IDs, condition labels (2×2, 3×3, 4×4), trial responses, accuracy scores, and response times for each trial.
As noted above, trials with response times significantly above or below the mean and participants who perform below chance level across conditions will be excluded
After filtering the data, we will calculate mean accuracy for each participant across conditions (2×2, 3×3, 4×4) and create a new column for analysis.

Confirmatory analysis

As we did not finish coding for the 3 * 3 and 4 * 4 condition, we only asked 10 participants to complete 2 * 2 condition in the pilot test. The following confirmatory analysis is based solely on their performance in the 2 * 2 condition. Due to lack of within-condition data, it appears to be impossible to conduct ANOVA analysis at the time.

Accuracy

accuracy <- mean(cleaned_data$correct, na.rm = TRUE)
print(paste("Overall Accuracy: ", round(accuracy * 100, 2), "%"))

[1] "Overall Accuracy:  82.78 %"

In the pilot test, the overall accuracy in 2*2 condition among 10 participants is 0.828, which means, on average, each participant matched about 15 out of 18 pseudowords with their corresponding images (picture of uncommon objects, as noted above) correctly.

image_accuracy <- cleaned_data %>%
  group_by(correct_image) %>%
  summarize(accuracy = mean(correct, na.rm = TRUE))

ggplot(image_accuracy, aes(x = correct_image, y = accuracy)) +
  geom_line(color = "forestgreen", size = 1) + 
  geom_point(color = "skyblue3", size = 2) +
  scale_x_continuous(breaks = seq(1, 18, 1))+
  labs(x = "Image",
       y = "Accuracy")

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

The line plot presents the accuracy for each image, ranging from 0.6 to 1.0. Differences in accuracy could be due to certain images being more ambiguous, unfamiliar, or harder to associate with the correct pseudowords.

Response Time

response_time <- cleaned_data %>%
  group_by(correct) %>%
  summarize(
    mean_response_time = mean(response_time, na.rm = TRUE),
    median_response_time = median(response_time, na.rm = TRUE),
    sd_response_time = sd(response_time, na.rm = TRUE)
  )

print(response_time)

# A tibble: 2 × 4
  correct mean_response_time median_response_time sd_response_time
  <lgl>                <dbl>                <dbl>            <dbl>
1 FALSE                4356.                 4071            1820.
2 TRUE                 3453.                 3112            1683.

As shown in the table, participants in the pilot test took significantly longer to respond when their answers were incorrect (mean response time = 4356.23 ms, SD = 1819.59 ms) compared to when their answers were correct (mean response time = 3453.13 ms, SD = 1682.96 ms).

t-test

t_test <- t.test(
  response_time ~ correct,
  data = cleaned_data
)

print(t_test)


    Welch Two Sample t-test

data:  response_time by correct
t = 3.6007, df = 84.099, p-value = 0.0005357
alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
95 percent confidence interval:
  404.3413 1401.8553
sample estimates:
mean in group FALSE  mean in group TRUE 
           4356.226            3453.128

Since p < 0.05 , we reject the null hypothesis and conclude that there is a statistically significant difference in response times between the two groups. Response times are significantly faster for correct responses (correct = TRUE) compared to incorrect responses (correct = FALSE).

Power Analysis

cohens_d <- cohens_d(response_time ~ correct, data = cleaned_data)
cohens_d

Cohen's d |       95% CI
------------------------
0.53      | [0.25, 0.81]

- Estimated using pooled SD.

pwr_result <- pwr.t.test(
  d = 0.53,
  sig.level = 0.05,
  power = 0.80,
  type = "two.sample"
)
print(pwr_result)


     Two-sample t test power calculation 

              n = 56.86016
              d = 0.53
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

The pilot test suggests that there is a medium effect siz of 0.53 in the 2 * 2 condition. Using G * Power, we conducted a power analysis for a two-sample t-test with a significance level of alpha = 0.05 and desired power of 0.80. The results indicated that we would need at least 57 participants per group to achieve sufficient statistical power for detecting this effect size. However, the results can be biased for only 2 * 2 condition is taken into consideration.

Additional Links

Stimuli: https://github.com/ucsd-psych201a/yu2007/tree/main/stimuli
Code (only includes 2*2 condition) : https://github.com/ucsd-psych201a/yu2007/blob/main/coding_test.html

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.