Kids’ Experiment

Task description

Each experiment contains 4 trials, because a previous pilot run with 4 children showed that 6 trials is too long for their attention span. Each trial contains 6 training slides (non-ambiguous utterance) and 1 test slide (ambiguous utterance), as shown below. Instead of showing the words on the screen like in the adult pilot, sound recordings of the animals are played.

Training slide Test slide

Left: training slide, Right: test slide

The items appearing in a trial belong to a set of 3 categories of items (from the 6 categories: musical instruments, fruits, vehicles, furniture, mammals, clothes) that is unique across 4 trials. We run the experiment with distribution (6-0-0) across all 4 trials, with each number corresponding to how many times a category has items appear in the trial. This is to see if children are able to make category inference from the most informative set of input.

In contrast to the adult version, we moved the items first because during the first pilot run, children had a strong bias to pick the item in the middle, presumably because they were trying to interpret the agent’s gaze.

Sample size

We remove responses that get less than 5/6 correct in training slides.

age	n
2	2
3	6
4	3

Finding

Overall

The dotted line represents performance expected by chance (1/3), and the error bar are 95% CIs.

More participants are able to choose the item from the most frequent category compared to the other two categories, however, performance is not much above chance level.

By age group

We then want to further look at the result by age group.

From the data, 2-year-olds appear to be best at choosing the item from the most frequent category, followed by 3 and 4-year-olds respectively. However, this may be because of the small number of participants for each age group, and also the distribution of most frequent categories.

Choosing the middle item

When only cases where the participants did not choose the correct item are included, we observe a higher proportion of participants choosing the middle item than the left and right items.

Result by category

During the pilot, it also seemed that participants are able to pick up on some categories more than others.

For reference, the number of appearances as the most frequent category for each category is as such:

category	n
clothes	6
fruits	8
furniture	7
instruments	6
mammals	9
vehicles	8

Here we’re plotting only the most frequent category. Participants picked up on mammals the most, followed by vehicles, fruits, clothes, instruments, furniture.

Result by trial

We also want to look at whether participants change their choosing strategy across trials, for example, because they have picked up that the task is to choose the item from the most frequent category and thus perform better over time.

We see that while the 4th trial has the best performance (most participants choosing the item from the most frequent category), there is a decline in performance in the 3rd trial.

Moving forward

Design decisions

Four trials looks like a suitable length for the attention span of participants in the age group we are looking at.
From the analysis of item category salience, we will remove the category of furniture and instruments, which do not seem to perform as well as the other 4 categories (mammals, vehicles, fruits, clothes).

Data collection

We pilot 10 more children with 4 categories and 4 trials with distribution (6-0-0). If the results go in the same direction as the previous results, we run a full sample of 2-, 3-, and 4-year-olds (20 children per age group).
If the results from the pilot suggest that children will not be able to do this, we change the procedure so that the agent is explicitly naming the target category (for example, asks for “car” 6 times instead of for different vehicles).

DisCon Kids Pilot Summary