1 Intro

We conducted three experiments examining children’s ability to understand sine wave speech. Experiment 1 also assessed children’s ability to understand Mooney pictures. The different experiments examined baseline interpretation of sine wave speech, pop out effects, and perceptual learning.

2 Experiment 1 – Match to sample

Children in this study had to match a human speech sentence to one of two robot (i.e., sine wave speech) sentences, or they had to match a clear picture to one of two Mooney pictures. The aim was to get a first assessment of children’s ability to process distorted stimuli using a top-down cue.

2.1 Methods

Participants were tested at the Edinburgh Zoo, the Edinburgh DevLab, and in the local community. This was a low n pilot study; the table below shows participants per age group.

age	Mooney Pictures	Sine-Wave Speech
2	5	5
3	5	5
4	9	9
5	3	4
6	3	3
9	1	1

Participants completed 3 practice trials with blurred images or noise distorted speech, followed by 24 test trials with mooney images or sine wave speech. On each trial, the child had to select either which of the robot’s pictures matched the teacher’s standard, or which of the robot’s sentences matched the teacher’s standard. To play the sentences, the experimenter tapped the Teacher. Triplets of sentences could be repeated, but individual sentences could not be repeaated.

Left: 2AFC Mooney picture choice. Right: 2AFC sine wave speech choice

2.2 Graph of Experiment 1 by Age

Comparing age groups across the two studies. Most participants (but not all) did both experiments. Errors bars are 95% CIs

All age groups were above chance when interpreting Mooney pictures, altho 2-year-olds had relatively low performance. 2- and 3-year-olds were not credibly different from chance when interpreting Sine Wave speech. Performance on the sine wave speech task had a more gradual improvement with age.

2.3 Conclusions

Experiment 1 suggested that sine wave speech was harder to interpret than Mooney pictures, although the youngest children showed limited skill on both tasks. We had two concerns about this manipulation, however. First, the sine wave speech task places more of a demand on working memory than the picture task (as you are matching a heard sentence to a previously heard sample, rather than matching a visible picture to a visible sample). Second, 2AFC may be a conceptually difficult task for young children. Anecdotally, my RAs reported that this task was difficult for the younger children, especially the sine wave speech task.

Our subsequent two experiments only assessed sine wave speech. In Experiment 2, we moved to a method that we thought might be more naturalistic – picture selection.

3 Experiment 2

Participants in Experiment 2 did a word recognition task. We thought that this method might better allow younger children to demonstrate their ability to process sine wave speech. Moreover, it also allowed us to assess children’s processing of sine wave sentences when heard naively, versus when heard after a clear speech prime. As a little extra bonus, we also assessed adults, and a second group of adults who were beginner learners of Italian (with stimuli translated into Italian). If second-language learners also show difficulty with sine wave speech, this could indicate that the problem for children lies in their degree of linguistic knowledge, rather than in connections between levels of representation.

Participants in this study were 3- and 4-year-olds learning English, adult English speakers learning Italian (with the stimuli translated into Italian), and adult English speakers. They saw three pictures on each trial and heard three sentences (e.g., point to the dog). The third sentence was always sine wave speech. On half of the trials the sine wave speech sentence mentioned the so-far untouched picture (i.e., it was a novel sentence, the different condition), and on half of the trials it mentioned one of the previously mentioned pictures (either repeating the first heard-sentence [far condition] or the second-heard sentence [near condition]). For half of the participants the sine wave speech sentences used the same voice as the clear sentences [match condition], and for half of the trials it used a different gender voice [mismatch condition].

Unfortunately this design led to a large response bias in children, to touch the unmentioned picture on the third instruction.

3.1 Methods

Participants were tested in preschools around Perthshire, in the Edinburgh DevLab, and in the Edinburgh University community. 2nd language speakers were taken from an Introductory Italian class at Edinburgh University. The table below shows participants per age group.

expt2_summary <- expt2 %>%
    dplyr::select(resp,subject_id,age_years,Group) %>%
    dplyr::group_by(subject_id,age_years,Group) %>%
    dplyr::summarise(responded = mean(resp,na.rm = T)) %>% 
    dplyr::group_by(age_years,Group) %>%
    dplyr::select(responded,subject_id,age_years,Group) %>%
    dplyr::summarise(n = length(responded))

age_years	Adults: 1st Lang	Adults: 2nd Lang	Children
2	NA	NA	4
3	NA	NA	37
4	NA	NA	37
5	NA	NA	2
adult	20	15	NA

Participants completed 1 practice trial with sine wave speech, followed by 24 test trials. On each trial, participants would see three pictures and hear three instructions; they had to tap the mentioned picture after each instruction. The Experimenter tapped the green button to produce each instruction.

Words/pictures were chosen such that CDI norms show that 75% of 2-year-olds know that word.

3 choice word recognition task

3.2 Graphs & Analyses

3.2.1 Overall data

First we graph overall performance, then exclude trials where participants did not recognise one of the clear speech items. We collapse children into a single group, because there do not appear to be any interesting developmental differences.

You can see that accuracy is generally quite high for clear speech items, even in young children (suggesting the task is simple). But accuracy declines for distorted speech across all groups, especially for children and 2nd language speakers.

You can easily see the response bias – children are considerably more accuracte on “different” trials, i.e., the unheard word, than on Near/Far trials. But they are above chance overall, assuming a big criterion effect, and they clearly interpret the repeated distorted speech to be distinct from the unrepeated distorted speech (i.e., the ‘different’ condition).

ggplot(expt2, aes(x = distance, y = resp, color = clarity))+
  facet_wrap(~speaker_match+Group,nrow=2) + 
  ylim(c(0,1))+
  stat_summary_bin(fun.data = mean_cl_boot)+
  theme_cowplot()+
  ggtitle("Accuracy across groups and conditions")+
  xlab("Condition (Unheard word, far prior word, near prior word)")+
  ylab("Accuracy")+
  labs(caption = "")

#######
# Exclude trials where they got a clear word wrong
wrong_clear <- summaryBy(resp ~ subject_id + trial_id, data = subset(expt2, clarity != "distorted")) 
wrong_clear <- subset(wrong_clear, resp.mean <1)
wrong_clear$subj_trial <- paste(wrong_clear$subject_id, wrong_clear$trial_id)
expt2$subj_trial <- paste(expt2$subject_id,expt2$trial_id)
expt2 <- expt2[!expt2$subj_trial %in% wrong_clear$subj_trial,]

3.2.2 Corrected graphs

To minimize the possibility that our analyses are contaminated by effects of lexical knowledge, we remove trials on which participants made a mistake on clear items (i.e., we only assess children’s behavior on words that we have good reason to believe they know). All analyses are now done on this restricted dataset.

ggplot(expt2, aes(x = distance, y = resp, color = clarity))+
  facet_wrap(~speaker_match+Group,nrow=2) + 
  ylim(c(0,1))+
  stat_summary_bin(fun.data = mean_cl_boot)+
  theme_cowplot()+
  ggtitle("Accuracy across groups and conditions")+
  xlab("Condition (Unheard word, far prior word, near prior word)")+
  ylab("Accuracy")+
  labs(caption = "")

3.2.2.1 Response bias

The graph below illustrates the response bias in children; a tendency to choose the picture that was unmentioned on clear speech trials.

ggplot(expt2, aes(x = distance, y = choose_unmentioned, color = clarity))+
  facet_wrap(~speaker_match+Group,nrow=2) + ylim(c(0,1))+
  stat_summary_bin(fun.data = mean_cl_boot)+theme_cowplot()+ggtitle("Choice of unmentioned picture across groups and conditions")+xlab("Condition (Unheard word, far prior word, near prior word)")+ylab("Choose unmentioned picture")

3.2.3 Change in accuracy for distorted speech across groups

Because 1st language adults were close to ceiling in both conditions, and children were close to ceiling on clear speech, models that compare clear and distorted conditions don’t converge, or produce implausible parameter estimates. So we instead simple analyze accuracy at interpreting distorted speech across groups.

Analyses use Bayesian logistic regression models; in the final column of each table I’ve put *s next to parameters whose credible intervals do not include 0.

z<-1
kable(convert_stan_to_dataframe(compare_children_1stLang),digits=2, caption = "Accuracy understanding distorted speech for children vs. Adults (1st lang)")

Accuracy understanding distorted speech for children vs. Adults (1st lang)
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	3.05	0.31	2.46	3.67	1837.28	1	*
GroupChildren	-2.35	0.33	-3.01	-1.73	1828.11	1	*

kable(convert_stan_to_dataframe(compare_children_2ndLang),digits=2, caption = "Accuracy understanding distorted speech for children vs. 2nd language learners")

Accuracy understanding distorted speech for children vs. 2nd language learners
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.81	0.24	0.35	1.28	1814.45	1	*
GroupChildren	-0.13	0.26	-0.64	0.37	1726.51	1	-

kable(convert_stan_to_dataframe(compare_1st_2ndLang),digits=2, caption = "Accuracy understanding distorted speech for Adult 1st language vs. 2nd language learners")

Accuracy understanding distorted speech for Adult 1st language vs. 2nd language learners
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	3.60	0.54	2.6	4.72	559.45	1	*
GroupAdults:2ndLang	-2.69	0.73	-4.2	-1.36	477.27	1	*

#kable(convert_stan_to_dataframe(compare_children_1stLang_distorted),digits=2, caption = "Accuracy on distorted speech for Children vs Adults 1st language")

3.2.4 The effect of speaker match on distorted speech comprehension across groups

We also compared overall accuracy on distorted speech across groups depending on whether the distorted and undistorted voices matched. We see interactions between children and adults (both groups), such that children are actually less affected by speaker match – this might hint that children are really failing to use the top-down cue here. But when analyzed separately, 1st and 2nd language adults don’t show the speaker_match effect. That is to say, there is a difference between groups, but neither group is credibly different from zero. Harrumph…

3.2.4.1 Interaction effects

z<-1
kable(convert_stan_to_dataframe(compare_children_1st_speaker_match),digits=2, caption = "Matching vs mismatching distorted speech for children vs. 1st language adults")

Matching vs mismatching distorted speech for children vs. 1st language adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	4.09	0.49	3.19	5.13	1144.63	1	*
speaker_matchSpeakerMismatch	-1.89	0.61	-3.10	-0.68	962.87	1	*
GroupChildren	-3.56	0.51	-4.62	-2.65	1097.27	1	*
speaker_matchSpeakerMismatch:GroupChildren	2.17	0.65	0.89	3.52	863.60	1	*

kable(convert_stan_to_dataframe(compare_children_2nd_speaker_match),digits=2, caption = "Matching vs mismatching distorted speech for children vs. 2nd language adults")

Matching vs mismatching distorted speech for children vs. 2nd language adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.74	0.26	0.23	1.26	853.10	1.00	*
speaker_matchSpeakerMismatch	-0.76	0.43	-1.62	0.12	696.34	1.01	-
GroupChildren	-0.22	0.30	-0.80	0.36	811.49	1.00	-
speaker_matchSpeakerMismatch:GroupChildren	1.03	0.47	0.09	1.96	659.93	1.01	*

kable(convert_stan_to_dataframe(compare_1st_2nd_speaker_match),digits=2, caption = "Matching vs mismatching distorted speech for 1st language adults vs 2nd language")

Matching vs mismatching distorted speech for 1st language adults vs 2nd language
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	4.47	0.69	3.28	5.96	1000.97	1	*
speaker_matchSpeakerMismatch	-1.90	0.86	-3.62	-0.23	808.60	1	*
GroupAdults:2ndLang	-3.65	0.83	-5.37	-2.19	727.82	1	*
speaker_matchSpeakerMismatch:GroupAdults:2ndLang	1.04	1.19	-1.35	3.44	696.99	1	-

3.2.4.2 Simple effects

z<-1
kable(convert_stan_to_dataframe(children_match),digits=2, caption = "Matching vs mismatching distorted speech for children")

Matching vs mismatching distorted speech for children
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.52	0.12	0.28	0.77	834.13	1	*
speaker_matchSpeakerMismatch	0.27	0.18	-0.08	0.64	1338.77	1	-

kable(convert_stan_to_dataframe(adult_1st_match),digits=2, caption = "Matching vs mismatching distorted speech for 1st language adults")

Matching vs mismatching distorted speech for 1st language adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	5.30	1.27	3.38	8.22	542.42	1	*
speaker_matchSpeakerMismatch	-2.33	1.54	-5.63	0.38	528.04	1	-

kable(convert_stan_to_dataframe(adult_2nd_match),digits=2, caption = "Matching vs mismatching distorted speech for 2nd language adults")

Matching vs mismatching distorted speech for 2nd language adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.78	0.38	0.05	1.57	552.12	1	*
speaker_matchSpeakerMismatch	-0.80	0.65	-2.10	0.41	492.97	1	-

3.2.5 The effect of prime distance on interpretation

Do different participant groups perform better on close primed stimuli or far primed stimuli?

z<- 2
kable(convert_stan_to_dataframe(child_distance),digits=2, caption = "Effect of prime distance for children")

Effect of prime distance for children
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	-1.12	0.31	-1.76	-0.51	379.43	1	*
distancenear	0.47	0.21	0.06	0.91	2000.00	1	*

kable(convert_stan_to_dataframe(first_lang_distance),digits=2, caption = "Effect of prime distance for 1st language adults")

Effect of prime distance for 1st language adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	4.31	1.13	2.61	6.92	972.56	1	*
distancenear	-0.08	0.56	-1.15	1.01	4000.00	1	-

kable(convert_stan_to_dataframe(second_lang_distance),digits=2, caption = "Effect of prime distance for 2nd language adults")

Effect of prime distance for 2nd language adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.22	1.08	-1.94	2.33	443.79	1.01	-
distancenear	1.90	0.96	0.24	4.12	768.30	1.00	*

3.2.6 Learning over the experiment

Did participants’ accuracy increase over the study? Yes for 1st language adults, no for children. But it is unclear for 2nd language adults – probably we need a larger sample.

3.2.6.1 Learning

This graph illustrates learning over time. You can see that 1st language adults appear to improve a little, and perhaps 2nd language adults too. But not Children.

ggplot(subset(expt2, clarity == "distorted"), aes(x = trial_number, y = resp, color = Group))+
  geom_smooth(method = "glm", method.args = list(family = "binomial")) + 
  ylim(c(0,1.05))+
  ggtitle("Change in accuracy over the study")+
  stat_summary_bin(fun.data = mean_cl_boot)+
  ylab("Accuracy")+
  xlab("Trial number")

kable(convert_stan_to_dataframe(learning_overall),digits=2, caption = "Effect of learning over trials across groups")

Effect of learning over trials across groups
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	3.22	0.33	2.60	3.89	1452.73	1	*
scaletrial_number	0.52	0.21	0.11	0.94	1553.70	1	*
GroupAdults:2ndLang	-2.40	0.44	-3.32	-1.58	1154.03	1	*
GroupChildren	-2.52	0.35	-3.23	-1.87	1372.48	1	*
scaletrial_number:GroupAdults:2ndLang	-0.39	0.26	-0.94	0.10	1681.42	1	-
scaletrial_number:GroupChildren	-0.56	0.22	-0.99	-0.15	1710.10	1	*

kable(convert_stan_to_dataframe(learning_children),digits=2, caption = "Effect of learning over trials for children")

Effect of learning over trials for children
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.68	0.09	0.51	0.86	676.07	1	*
scaletrial_number	-0.04	0.06	-0.15	0.08	2000.00	1	-

kable(convert_stan_to_dataframe(learning_first_lang),digits=2, caption = "Effect of learning over trials for first lang adults")

Effect of learning over trials for first lang adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	4.38	0.89	2.91	6.41	693.77	1	*
scaletrial_number	0.63	0.25	0.16	1.12	2000.00	1	*

kable(convert_stan_to_dataframe(learning_second_lang),digits=2, caption = "Effect of learning over trials for second lang adults")

Effect of learning over trials for second lang adults
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.89	0.41	0.09	1.68	653.44	1.01	*
scaletrial_number	0.14	0.24	-0.35	0.63	1503.60	1.00	-

z<-1

3.2.6.2 Repetitions

This graph illustrates the (log) number of times that participants were given a repetition of the clear/distorted speech. Different colored lines indicate if the stimuli were on familiar trials (distorted speech as a repetition) or not. Children were given more repetitions overall.

repetition <- ggplot(expt2, aes(x = trial_number, y = log(play_counter), color= novelty))+
  geom_smooth(method = "loess")+
  facet_wrap(~clarity+Group,ncol=3, nrow = 2) + 
  ylab("log Number of repetitions per trial")+
  ggtitle("Repetitions per trial over the study")
repetition

3.3 Conclusions

While Experiment 2 provided evidence that younger children can indeed interpret sine wave speech, they clearly were error prone in doing so. Unfortunately this experimental design was greatly contaminated by response bias, which makes it hard to draw strong conclusions, particularly about whether children show top-down pop out effects.

We replicated a finding by Nittrouer and Lowenstein (2013) that second-language learners have difficulty interpreting sine wave speech. This is particularly interesting, because it raises the possibility that children’s difficulties are not driven by immature top-down connections, but by incomplete linguistic knowledge.

Our next Experiment had three goals:

To assess children’s interpretation of sine wave speech once the response bias was removed
To assess whether children show a “pop out” effect
To assess whether learning to understand sine wae speech is facilitated when top-down information is present.

4 Experiment 3 – Perceptual learning

Participants in Experiment 3 also did a word recognition task, but here we a) removed the response bias, and b) added a perceptual learning component (following Davis et al. 2005). Participants saw two pictures on each trial and heard three sentences (e.g., point to the dog). They heard 24 trials altogether (plus one practice at the start). The first 16 trials were training trials, and the final 8 trials assessed learning and generalization. The order in which participants heard the first 16 sentences determined whether their learning was top-down guided, or not. For the top-down guidance participants, the third sentence was always sine wave speech, and it was a distorted repetition of one of the two previously heard sentences. For the unguided participants, the first sentence was always sine wave speech, and it was a distorted version of a sentence that they would eventually hear. This design thus allowed us to manipulate learning style (top-down or not) and also to test whether children show pop-out effects (if so, they should be more accurate during top-down training trials).

The final 8 trials were always presented in the order Distorted-Clear-Clear, in order to test learning and generalization.

Participants in this study were 2- 3- and 4-year-olds learning English.

4.1 Methods

Participants were tested in preschools around Edinburgh, in the Edinburgh DevLab, and in the Edinburgh University community. The table below shows participants per age group. We aimed to test 8 participants per cell, but kept testing available participants until all cells were filled. This lead to the surprlus of 4-year-olds, and to our testing some 5-year-olds.

expt3_summary <- children %>%
    dplyr::select(resp,subject_id,age,test_lang) %>%
    dplyr::group_by(subject_id,age,test_lang) %>%
    dplyr::summarise(responded = mean(resp,na.rm = T)) %>% 
    dplyr::group_by(age,test_lang) %>%
    dplyr::select(responded,subject_id,age,test_lang) %>%
    dplyr::summarise(n = length(responded))

age	Clear_After	Clear_Before
2	9	8
3	8	11
4	14	14
5	2	1

Participants completed 1 practice trial with sine wave speech, followed by the 16 training trials and 8 test trials. On each trial, participants would see two pictures and hear three instructions; they had to tap the mentioned picture after each instruction. The Experimenter tapped the green button to produce each instruction.

2 choice word recognition task

Half the participants were assigned to a top-down condition: During the 16 training trials they heard two clear speech instructions followed by a distorted speech instruction (Clear-Clear-Distorted order). The other participants were assigned to a bottom-up condition: during their 16 training trials they heard a distorted speech instruction followed by two clear speech instructions (Distorted-Clear-Clear order).

After the 16 training trials, all participants took part in 8 test trials where distorted speech was presented before, and clear speech after (i.e., Distorted-Clear-Clear order).

Structure of Experiment 3

4.1.1 Experiment 3 Graphs

We do our analyses only on trials in which participants got the “clear” presentation of the words correct (because we don’t want to look at vocab knowledge, at least not for now). But I’ll first show the proportion of trials on which children were correct for the clear presentation. As you’ll see accuracy is high. Error bars are 95% CIs. We don’t graph 5-year-olds as a separate group, as there are only 3 of them, but they are included in analyses.

ggplot(subset(children, clarity !="distorted" & age != "5"), aes(x = block, y = acc, color = test_lang))+facet_wrap(~age,nrow=1) + ylim(c(0,1))+stat_summary_bin(fun.data = mean_cl_boot)+theme_cowplot()+ggtitle("Accuracy on clear speech")

From now on, we remove all trials on which participants got the clear word incorrect. The first graph shows averaged data by block, and the second shows trial-by-trial data over the course of the study (test trials start on block 16).

# Exclude trials where they got a clear word wrong
wrong_clear <- summaryBy(acc ~ subject_id + trial_id, data = subset(children, clarity != "distorted")) 
wrong_clear <- subset(wrong_clear, acc.mean <1)
wrong_clear$subj_trial <- paste(wrong_clear$subject_id, wrong_clear$trial_id)
children$subj_trial <- paste(children$subject_id,children$trial_id)
children <- children[!children$subj_trial %in% wrong_clear$subj_trial,]



age <- ggplot(subset(children, clarity=="distorted" & age != "5"), aes(x = block, y = acc, color = test_lang))+
  facet_wrap(~age,nrow=1) + ylim(c(0,1))+
  stat_summary_bin(fun.data = mean_cl_boot)+theme_cowplot()+ggtitle("Accuracy across ages, blocks and training types")



speaker_match <- ggplot(subset(children, clarity=="distorted" & age != "5"), aes(x = block, y = acc, color = test_lang))+
  facet_wrap(~speaker_match+age,nrow=2) + ylim(c(0,1))+
  stat_summary_bin(fun.data = mean_cl_boot)+theme_cowplot()+ggtitle("Effect of speaker match")


trial <- ggplot(subset(children, clarity == "distorted"  & age != "5"), aes(x = trial_number, y = acc, color = test_lang))+
  geom_smooth(method = "glm", method.args = list(family = "binomial"))+
  facet_wrap(~clarity+age,nrow=1) + 
  ylim(c(0,1.05))+
  ggtitle("Effect of learning")+
  ylab("Accuracy")


distance <- ggplot(subset(children, clarity=="distorted" & age != "5" & test_lang == "Clear_Before" & block == "Training"), aes(x = block, y = acc, color = distance))+
  facet_wrap(~age,nrow=1) + ylim(c(0,1))+
  stat_summary_bin(fun.data = mean_cl_boot)+theme_cowplot()+ggtitle("Effect of distance on Clear Before participants")


repetition <- ggplot(subset(children,  age != "5"), aes(x = trial_number, y = log(play_counter), color = test_lang))+
  geom_smooth(method = "loess")+
  facet_wrap(~clarity+age,nrow=2) + 
  ylab("log Number of repetitions per trial")+
  ggtitle("Repetitions per trial over the study")


#plot_grid(age, speaker_match,trial,distance)

4.2 Analyses

We’ll start with a full analysis of distorted trials, comparing Block (training/test), Training regime (clear before/after), age (scaled in weeks), and speaker match (march/mismatch). Unsurprisingly, nothing is significant, except that kids get better with age.

z <- 1
kable(convert_stan_to_dataframe(major_analysis.expt3),digits=2, caption = "Full Analysis")

Full Analysis
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.79	0.15	0.52	1.09	756.65	1.00	*
speaker_matchSpeakerMismatch	0.07	0.13	-0.20	0.32	989.62	1.00	-
test_langClear_Before	0.23	0.13	-0.04	0.48	1178.35	1.00	-
scaleage_weeks	0.29	0.14	0.01	0.56	1251.62	1.00	*
block.L	0.12	0.10	-0.07	0.33	2000.00	1.00	-
speaker_matchSpeakerMismatch:test_langClear_Before	-0.07	0.13	-0.32	0.18	985.51	1.00	-
speaker_matchSpeakerMismatch:scaleage_weeks	0.10	0.14	-0.18	0.37	1150.47	1.00	-
test_langClear_Before:scaleage_weeks	0.08	0.14	-0.20	0.34	933.29	1.00	-
speaker_matchSpeakerMismatch:block.L	0.06	0.09	-0.12	0.24	2000.00	1.00	-
test_langClear_Before:block.L	0.06	0.10	-0.14	0.25	2000.00	1.00	-
scaleage_weeks:block.L	-0.02	0.10	-0.22	0.18	2000.00	1.00	-
speaker_matchSpeakerMismatch:test_langClear_Before:scaleage_weeks	0.14	0.14	-0.12	0.42	1029.46	1.01	-
speaker_matchSpeakerMismatch:test_langClear_Before:block.L	0.08	0.10	-0.11	0.28	2000.00	1.00	-
speaker_matchSpeakerMismatch:scaleage_weeks:block.L	-0.13	0.11	-0.34	0.08	2000.00	1.00	-
test_langClear_Before:scaleage_weeks:block.L	0.09	0.10	-0.11	0.29	2000.00	1.00	-
speaker_matchSpeakerMismatch:test_langClear_Before:scaleage_weeks:block.L	-0.17	0.11	-0.38	0.04	2000.00	1.00	-

4.2.1 Top down effects

age

4.2.1.1 Top down effects during training

There was no large pop-out effect. Children improve with age, but not by much.

z <- 1
kable(convert_stan_to_dataframe(major_analysis.expt3.train), digits=3, caption = "Are children more accurate when the clear sentence comes before? Not a lot more accurate (credible interval includes 0)")

Are children more accurate when the clear sentence comes before? Not a lot more accurate (credible interval includes 0)
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.592	0.134	0.347	0.865	785.697	1.005	*
test_langClear_Before	0.154	0.129	-0.092	0.412	670.967	0.999	-
scaleage_weeks	0.292	0.127	0.040	0.542	725.247	1.002	*
test_langClear_Before:scaleage_weeks	0.027	0.131	-0.224	0.288	683.396	1.002	-

4.2.1.2 Top down effects during test

Did training that allowed pop-out subsequently allow children to do better at test? The effect is in the right direction, but is not significant.

z <- 1
kable(convert_stan_to_dataframe(major_analysis.expt3.test), digits=3, caption = "Are children more accurate at Test when they have been trained on the clear sentence coming before? Maybe a little...")

Are children more accurate at Test when they have been trained on the clear sentence coming before? Maybe a little…
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.878	0.194	0.505	1.281	1805.327	1.001	*
test_langClear_Before	0.256	0.178	-0.083	0.618	2000.000	1.001	-
scaleage_weeks	0.278	0.215	-0.150	0.714	2000.000	1.002	-
test_langClear_Before:scaleage_weeks	0.127	0.199	-0.263	0.530	2000.000	1.001	-

4.2.2 Does children’s general performance improve with age

We look at accuracy on clear speech (we return to the dataset on which trials are not excluded based on whether participants made a mistake indentifying clear speech items).

kable(convert_stan_to_dataframe(expt3.clear), digits=3, caption = "Are children more accurate on clear speech as they age?")

Are children more accurate on clear speech as they age?
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	4.119	0.257	3.660	4.646	685.164	1.003	*
scaleage_weeks	0.505	0.210	0.095	0.916	941.491	1.006	*

kable(convert_stan_to_dataframe(expt3.distorted), digits=3, caption = "Are children more accurate on distorted speech as they age?")

Are children more accurate on distorted speech as they age?
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.620	0.117	0.396	0.855	1092.121	1.002	*
scaleage_weeks	0.279	0.110	0.056	0.503	1109.589	1.000	*

kable(convert_stan_to_dataframe(expt3.distorted.excluded), digits=3, caption = "Are children more accurate on distorted speech as they age? (excluding incorrect questions)")

Are children more accurate on distorted speech as they age? (excluding incorrect questions)
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.625	0.114	0.398	0.858	1337.713	1.002	*
scaleage_weeks	0.256	0.109	0.046	0.478	1141.699	1.004	*

4.2.2.1 Accuracy against chance for each age group

We look at children’s accuracy on distorted speech in each age group. 3s and 4s are above chance, but the CI for 2s includes 0, even in the topdown clear-speech-first condition.

kable(convert_stan_to_dataframe(age.2), digits=3, caption = "Overall accuracy at Age 2")

Overall accuracy at Age 2
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.246	0.211	-0.171	0.672	949.343	1.002	-

kable(convert_stan_to_dataframe(age.3), digits=3, caption = "Overall accuracy at Age 3")

Overall accuracy at Age 3
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.667	0.238	0.198	1.148	673.22	1.004	*

kable(convert_stan_to_dataframe(age.4), digits=3, caption = "Overall accuracy at Age 4")

Overall accuracy at Age 4
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.78	0.198	0.393	1.168	1162.333	1.002	*

kable(convert_stan_to_dataframe(age.2.TD), digits=3, caption = "Overall accuracy at Age 2 in TopDown condition")

Overall accuracy at Age 2 in TopDown condition
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.379	0.489	-0.54	1.408	690.26	1.002	-

4.2.3 Improvement in accuracy over the experiment (i.e., perceptual learning effects)

We look at how children’s accuracy improves over the experiment. There is not a significant improvement in accuracy, but gazing at the graphs, I would suggest that there is in fact a small learning effect which aren’t able to measure with this design.

trial

kable(convert_stan_to_dataframe(expt3.learning ), digits=3, caption = "Learning over trials by age")

Learning over trials by age
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.638	0.124	0.396	0.887	1174.984	1.000	*
scaletrial_number	0.104	0.068	-0.033	0.237	2000.000	1.000	-
scaleage_weeks	0.264	0.118	0.049	0.497	1288.243	1.000	*
scaletrial_number:scaleage_weeks	-0.068	0.069	-0.199	0.066	2000.000	0.999	-

z<-0

4.2.4 Change in repetitions given

4.2.4.1 Repetitions during training

How often do children request a repetition during training trials (trials 2-17), across the training conditions?

Children in the Clear Before condition are less likely to request a repetition overall, and there is a marginal effect such that the rate at which they request declines faster over the training phase.

repetition

z<-0
kable(convert_stan_to_dataframe(expt3.rep_train ), digits=3, caption = "Repetitions during training phase by trial number and condition")

Repetitions during training phase by trial number and condition
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.753	0.029	0.697	0.809	982.474	1.001	*
scaletrial_number	-0.101	0.013	-0.126	-0.074	2000.000	1.001	*
test_langClear_Before	-0.101	0.026	-0.156	-0.051	1148.129	1.002	*
scaletrial_number:test_langClear_Before	-0.025	0.014	-0.053	0.002	2000.000	0.998	-

kable(convert_stan_to_dataframe(expt3.rep_train.before ), digits=3, caption = "Repetitions during training phase by trial number -- Clear Before")

Repetitions during training phase by trial number – Clear Before
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.647	0.045	0.554	0.736	469.895	1.005	*
scaletrial_number	-0.127	0.022	-0.169	-0.085	2000.000	0.999	*

kable(convert_stan_to_dataframe(expt3.rep_train.after ), digits=3, caption = "Repetitions during training phase by trial number -- Clear After")

Repetitions during training phase by trial number – Clear After
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.849	0.037	0.776	0.921	640.406	1.005	*
scaletrial_number	-0.072	0.019	-0.110	-0.035	2000.000	0.999	*

4.2.4.2 Repetitions at test

Does training regime affect how often children request repetitions during the test phase (trials 18-25)? Recall that test phase is matched across participants.

Children who got Clear-Before training no longer request fewer repetitions when we move to the Test phase.

kable(convert_stan_to_dataframe(expt3.rep_test ), digits=3, caption = "Repetitions during test phase by trial number and condition")

Repetitions during test phase by trial number and condition
	Estimate	Est.Error	l.95..CI	u.95..CI	Eff.Sample	Rhat	Diff_from_zero
Intercept	0.764	0.025	0.716	0.816	1593.011	1.000	*
scaletrial_number	-0.004	0.018	-0.039	0.033	2000.000	1.000	-
test_langClear_Before	0.008	0.024	-0.039	0.056	1489.855	1.000	-
scaletrial_number:test_langClear_Before	-0.003	0.019	-0.041	0.034	2000.000	0.999	-

4.2.5 Other analyses

4.2.5.1 Effect of speaker match

We don’t see much in the way of an effect of speaker match (same voice for clear/distorted stimuli).

speaker_match

4.2.5.2 Effect of prime distance

We don’t see much in the way of an effect of prime distance (whether the to-be-repeated clear sentence in the Clear_Before condition comes immediately prior to the distorted sentence, or one sentence prior).

distance

4.3 Conclusions

While 2-year-olds had difficulty interpreting sine wave speech, three- and four-year-olds showed above chance performance on our task. Nevertheless, their accuracy was still low. Looking at the graphs, children’s accuracy improved a little with practice, but there was no step-change: Even 4-year-olds had difficulty with this task. In fact, their performance was numerically lower than in Experiment 1’s 2AFC task. Perhaps this was due to the semantically bleached stimuli we used in Expts 2 & 3 (where the carrier phrase Can you touch… did not provide information about the continuation).

Interestingly, we found little evidence that top-down information plays an important role in children’s accuracy interpretating the instructions. First, we found little evidence for pop-out effects: During training, accuracy was not much higher in the Clear Before condition than Clear After. In addition, children who were given a top-down learning protocol were not better at test. Note, though, that both results were marginal: It seems more likely that there was a small effect we couldn’t measure, than no effect at all.

However, top-down information did appear to affect the number of repetitions that participants asked for (at least for ages 3 and 4). Under “pop out” conditions, children were less likely to request repetitions. Unfortunately, we can’t tell if this reflects improved understanding in the Clear_Before group, or improved confidence in the Clear_Before group. Because children in the Clear Before group show no effect of accuracy overall, and because increased repetitions is actually associated with worse performance (nb. that analysis is not shown here), we suspect that top down information is actually improving confidence rather than accuracy.

5 Ideas for the next experiment

Only allow participants a single presentation of each stimulus (or hold presentation numbers constant), to provide a better test of whether top-down info actually affects understanding.
Create sentence stimuli (and pictures) that are more complex, to allow for increased top-down predictability and redundancy (e.g., rather than have participants respond to touch the fireman, have them respond to the fireman is wearing a yellow hat.
Our design so far has used a constant process of assessment, but we could consider moving to a slightly different design in which children aren’t assessed on each trial, or on each phase of a trial, but are instead given a mix of unassessed learning phases, plus occasional test blocks. Potentially, this could allow us to provide children with an increased number of learning instances.
Test older children as well, to develop a better understanding of developmental change?
Otherwise, keep the design of Experiment 3?
One issue – might the pictures already be providing some top-down info?
Make it more of a finding game, have the pictures appear in different places, have a fancier background so that participants are actually searching for the picture, etc. Maybe have the pictures move from sentence to sentence so that they always have to find the picture.
To keep attention, have “fun” filler trials where children have to find the star, or the mouse, or similar, and they get a sticker when they do.
Do we want to have feedback? Probably not because it might disturb the youngest kids.

SineWaveSpeechInKids

Hugh Rabagliati

18/04/2017