A race model for communication in parent-child interactions

Simulation 1
Simulation 2
Simulation 3
Analysis 1
Analysis 2
Analysis 3

Let’s assume that the primary reason that parents and children use language is to communicate. We’ll suppose that communication begins with an intention to refer to something in the world. After the intention is fixed, the communicator has two choices of modalities: (1) speech—the communicator can produce the lexical item that refers to the entity, or (2) gesture—the communicator can produce a deictic gesture.

We’ll start by modeling this choice as a race between these two independent modalities. We’ll use the Linear Ballistic Accumulator (LBA) model to implement this race (below). The process goes as follows:

Some amount of time is required to produce the initial intention to communicate (\(t_{0}\)). I set this to .5s somewhat arbitrarily, but I suspect that is the right ballpark.
Each modality begins with some amount of initial activation that ranges from 0 to \(A\). This is supposed to account for some amount of trial-to-trial variability in response times. I set \(A\) to 0 because it shouldn’t matter for our purposes.
Each modality has a drift rate that determines how quickly activation accumulates over time. These drift rates are modeled as draws from a \(Normal(\mu,\sigma)\). We might assume that these drift rates vary as a function of things like how well known the lexical item is, how difficult the word form is to produce.
Accumulation proceeds until the information reaches a threshold (\(b\)) in some modality, at which time a response is produced. I set the thresholds to be the median conversational response time estimated by Frank, Lewis, and MacDonald’s (under review) analysis of the Fernald and Morikawa (1993) corpus of 12- and 18-month-olds and their parents.

LBA model

adapted from Donkin, Brown, & Heathcote (2011)

Simulation 1

Let’s set up a simple example just to get a feel for things. Let’s assume that words vary in their production difficulty, and suppose further that this difficulty distribution is normal with high variance (following McMurray, 2007). Let’s say that drift rates for adults correspond to this.

Let’s suppose that points are relatively costly to produce, but don’t vary much. We expect under this model that adults will sometimes point to things instead of producing the words for them, and should be more likely to do so for harder-to-produce words. Let’s set this up.

#Parameters
A = 0 #Starting evidence max
b = 3 #Decision threshold
t0 = .5 #Non-Decision time
mean_word = 1.75
sd_word = .5 # Words are easy but variable
mean_gesture = 1
sd_gesture = .2 #gestures are hard but consistent

# Fit distribution
r_lba <- rlba_norm(1e4, A, b, t0, c(mean_word, mean_gesture), c(sd_word, sd_gesture)) %>%
  mutate(response = factor(response, levels = c(2,1), labels = c("gesture", "word")))

#compute summary
summary_lba <- r_lba %>%
  group_by(response) %>%
  summarise(prop = n()/nrow(.), rt = mean(rt))

kable(summary_lba)

response	prop	rt
gesture	0.0828	3.186622
word	0.9172	2.254281

ggplot(data = r_lba, aes(x = rt, fill = response)) +
  facet_grid(response ~ .) + 
  geom_density(aes_string(y ="..count..")) +
  theme_bw(base_size = 16) + 
  geom_vline(aes(xintercept = rt), size = 1, lty = "dashed", data = summary_lba) + 
  theme(panel.grid = element_blank(), legend.position = "none",
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1)) +
  scale_x_continuous(limits = c(0, 6), name = "Reponse Time (seconds)") +
  scale_fill_brewer(palette = "Set1")

Simulation 2

Let’s set up a second more complex example. Suppose, for children, that the drift rate for words is a combination of whether they know the word, and if so how hard it is to produce. Let’s suppose that chidren are more likely to know easier words.

Let’s rank order words from 1 to 1000, and assume that children know words with probability proportional to 1/log(rank).

know_word <- sapply(1:1000, function(x) rbinom(1, 1, 1/(log(x+2))))
rate_word <- (1.5 + (1/log(1:1000+2))) * know_word

rate_word_data <- data.frame(rate = rate_word, word = 1:1000) %>%
  mutate(rate = ifelse(rate == 0, NA, rate))

ggplot(rate_word_data, aes(x = word, y = rate)) +
  geom_point(size = .5, color = "#377eb8") +
  theme_bw(base_size = 16) +
  theme(panel.grid = element_blank(), legend.position = "none",
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1)) +
  scale_y_continuous(limits = c(0, 3))

Then let’s draw the difficulty of each word from a normal with a mean of (1 + 1/log(rank)) and sd of .2. Let’s test each word 100 times.

#Parameters
sd_word_2 = .2 # Individual words are consistent

r_child <- lapply(rate_word, function(rate) rlba_norm(100, A, b, t0, 
                                                      c(rate, mean_gesture),
                                                      c(sd_word_2, sd_gesture))) %>%
  bind_rows() %>%
  mutate(response = factor(response, levels = c(2,1), labels = c("gesture", "word")))

#compute summary
summary_child <- r_child %>%
  group_by(response) %>%
  summarise(prop = n()/nrow(.), rt = mean(rt))

kable(summary_child)

response	prop	rt
gesture	0.82796	3.637381
word	0.17204	2.304743

ggplot(data = r_child, aes(x = rt, fill = response)) +
  facet_grid(response ~ .) + 
  geom_density(aes_string(y ="..count..")) +
  theme_bw(base_size = 16) + 
  geom_vline(aes(xintercept = rt), size = 1, lty = "dashed", data = summary_child) + 
  theme(panel.grid = element_blank(), legend.position = "none",
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1)) +
  scale_x_continuous(limits = c(0, 6), name = "Reponse Time (seconds)") +
  scale_fill_brewer(palette = "Set1")

Simulation 3

Now let’s check what happens as we increase the ease of producing speech but keep gesture constant. We should see changes in RT for both word and gesture, even though gesture itself doesn’t change in it’s time-to-produce! This is because in order to beat an easy-to-produce word, gesture has to come out really quickly.

#Parameters

word_types <- c("easy", "medium", "hard")
words <- c(2.5, 1.5, .5)

sd_word = .5 # Words are easy but variable

mean_gesture = 1
sd_gesture = .2 #gestures are hard but consistent

fit_lba <- function(type, mean_word) {
  r_lba <- rlba_norm(1e4, A, b, t0, c(mean_word, mean_gesture), 
                      c(sd_word, sd_gesture)) %>%
  mutate(response = factor(response, levels = c(2,1), labels = c("gesture", "word")),
         type = type) 
}

all_lbas <- map2(word_types, words, fit_lba) %>%
  bind_rows() %>% 
  mutate(type = factor(type, levels = c("easy", "medium", "hard")))

#compute summary
summary_lbas <- all_lbas %>%
  group_by(type, response) %>%
  summarise(n = n(), rt = mean(rt)) %>%
  group_by(type) %>%
  mutate(n = n/sum(n))


kable(summary_lbas)

type	response	n	rt
easy	gesture	0.0025	3.057335
easy	word	0.9975	1.749917
medium	gesture	0.1822	3.280074
medium	word	0.8178	2.446299
hard	gesture	0.7915	3.528542
hard	word	0.2085	3.181818

ggplot(data = all_lbas, aes(x = rt, fill = response)) +
  facet_grid(type ~ response) + 
  geom_density(aes_string(y ="..count..")) +
  theme_bw(base_size = 16) + 
  geom_vline(aes(xintercept = rt), size = 1, lty = "dashed", data = summary_lbas) + 
  theme(panel.grid = element_blank(), legend.position = "none",
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1)) +
  scale_x_continuous(limits = c(0, 6), name = "Reponse Time (seconds)") +
  scale_fill_brewer(palette = "Set1")

Analysis 1

Now let’s take a look at two parent-child dyads’ data to see if we can find anything that looks like a signature of this kind of process. In particular, I want to see whether there is systematicity in the kinds of intentions that typically get expressed by words or by gestures. Ideally, we’d also like to try to look at the time it takes to produce these communicative acts, but that may need to wait.

files <- list.files("data/", "*.csv", full.names = TRUE)

loaded_subjs <- map(files, read_csv) %>%
  bind_rows() %>%
  mutate(age = ifelse(session == 2, 18, 22))

# Also load CHIDLES frequencies for words to get an alternate estimator
eng_freqs <- read_csv("data/other/counts_english.csv")
# 
# miscoded <- loaded_subjs %>%
#   filter((referent != spoken_obj) & (referent != gestured_obj))

demos <- loaded_subjs %>%
  select(subj,age) %>%
  distinct(subj,age) %>%
  group_by(age) %>%
  summarise(n = n())
  
kable(demos)

age	n
18	8
22	8

Now let’s try our first analysis: Which referents come out as words, which come out as gestures?

# Code matching referents in multi-referent communications
overlapping_referents <- function(refs1, refs2) {
  refs1 <- str_split(refs1, ",")[[1]]
  refs2 <- str_split(refs2, ",")[[1]]

  overlapping_refs <- sapply(refs1,
                             function(ref) contains(str_detect(refs2, ref),
                                                    TRUE))
  contains(overlapping_refs, TRUE)
}

# Fix up multi-referent communications
max_refs <- loaded_subjs %>%
  gather(ref_col, refs, spoken_obj,gestured_obj,referent) %>%
  mutate(num_refs = str_count(refs,",")) %>%
  summarise(num_refs = max(num_refs, na.rm = T)) %>%
  as.integer()

ref_list <- sapply(1:(max_refs + 1),function(x) paste0("referent",x))

spread_referents<- loaded_subjs %>%
  separate_("referent", ref_list, sep = ",", fill = "right") %>%
  gather_("number", "referent", ref_list) %>%
  filter(!is.na(referent), referent != "picture") %>%
  mutate(referent = tolower(str_trim(referent)))

# Consolidate referents that are labelled differently across dyads
consolidate_refs <- function(referent) {
  if(is.na(referent)) as.character(NA)
  else if(referent == "booty" | referent == "bottom") "butt"
  else if(referent == "chick") "chicken"
  else if(referent == "person") "man"
  else if(referent == "poopoo") "poop"
  else if(referent == "rabbit") "bunny"
  else if(referent == "steffi") "stephie"
  else if(referent == "stomach" | referent == "tummy") "belly"
  else if(referent == "taxicab") "taxi"
  else if(referent == "telephone") "phone"
  else if(referent == "toolbox") "box"
  else referent
}

# Filter out abstract words that are difficult to communicate by gesture
filtered_referents <- spread_referents %>%
  filter(!referent %in% c("cold", "conversation", "date", "day", "downtown",
                          "fine", "ER", "ingredients of sandwich", "memory",
                          "morning", "one", "owner", "pound", "problem", 
                          "project", "request", "talk", "thief", "test", 
                          "final","time", "timeout", "upstairs", "weekend", 
                          "work", "story", "lunch", "hour", "minute", "damage", 
                          "second", "party", "birthday")) %>%
  rowwise() %>%
  mutate_each(funs(consolidate_refs), referent, spoken_obj, gestured_obj)


# Code responses and estimate response times
coded_responses <- filtered_referents %>%
  rowwise() %>%
  mutate(spoken = overlapping_referents(spoken_obj, referent) &
                         !overlapping_referents(gestured_obj, referent),
            gestured = overlapping_referents(gestured_obj, referent) &
                         !overlapping_referents(spoken_obj, referent),
            both = overlapping_referents(spoken_obj, referent) &
                         overlapping_referents(gestured_obj, referent)) %>%
  group_by(subj, age) %>%
  arrange(time) %>%
  mutate(rt = time - lag(time)) %>%
  rowwise() %>%
  mutate(rt = if(is.na(rt) | rt == 0 ) as.numeric(NA) else rt)

# Compute frequency of response by modality as well as RT
referent_counts <- coded_responses %>%
  gather(modality, value, spoken, gestured, both) %>%
  filter(value) %>%
  group_by(referent, person, modality) %>%
  summarise(n = n(),
            rt = mean(rt))

# Find all of the unique referents, divide them up by frequency
unique_refs <- referent_counts %>%
  group_by(referent) %>%
  summarise(total_n = sum(n)) %>%
  arrange(desc(total_n)) %>%
  mutate(rank = row_number()) %>%
  mutate(stem = wordStem(referent, "english")) %>%
  left_join(eng_freqs, by = c("stem" = "item")) %>%
  rename(stem_freq = count) %>%
  mutate(stem_cut = cut(stem_freq, 4),
         n_cut = cut(log(total_n), 4))

# Rearrange refernts by frequency
referent_counts_ordered <- referent_counts %>%
  left_join(unique_refs) %>%
  ungroup() %>%
  mutate(referent = factor(referent, levels = unique_refs$referent),
         modality = factor(modality, levels = c("gestured", "spoken", "both")))

#   
# missing_referents <- referent_counts %>%
#   filter((spoken + gestured + both) < n)


# Plot
ggplot(referent_counts_ordered, aes(x = referent, y = log(n+1), fill = modality)) +
  geom_bar(stat = "identity") + 
  facet_grid(person  ~ .) +
  theme_bw(base_size = 14) +
  theme(panel.grid = element_blank(), legend.position = c(.85,.85),
        legend.title = element_blank(), legend.text=element_text(size=10),
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5,size = 8)) +
  scale_x_discrete(name = "Referent") +
  scale_y_continuous(name = "Log Production Frequency") +
  scale_fill_brewer(palette = "Set1")

We see some evidence that gesture is distributed differently for children and adults. Children appear to show the predicted pattern (more gesture for rarer words), adults do not. Let’s test this statistically.

total_usage <- referent_counts_ordered %>%
  select(referent, total_n, rank, n_cut) %>%
  distinct()

person_usage <- coded_responses %>%
  gather(modality, value, spoken, gestured, both) %>%
  filter(value) %>%
  group_by(referent, person, subj, modality) %>%
  summarise(n = n(),
            rt = median(rt, na.rm = T)) %>%
  left_join(total_usage) %>%
  mutate(modality = factor(modality, levels = c("gestured", "spoken", "both")))

modality_usage <- person_usage %>%
  group_by(person, n_cut, subj) %>%
  mutate(prop = n/sum(n)) %>%
  group_by(person, n_cut, subj, modality) %>%
  summarise(prop = sum(prop)) %>%
  group_by(n_cut, person, modality) %>%
  multi_boot_standard("prop")

ggplot(modality_usage, aes(x = as.numeric(n_cut), y = mean, color = modality,
                           label = modality)) +
  facet_grid(person  ~ .) +
  geom_pointrange(aes(ymax = ci_upper, ymin = ci_lower)) +
  geom_line() + 
  theme_bw(base_size = 14) +
  theme(panel.grid = element_blank(), legend.position = "none",
        legend.title = element_blank(), legend.text=element_text(size=10),
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1))+
  scale_x_continuous(name = "Log Frequency Quartile", limits = c(.5, 5), 
                     breaks = seq(1,4)) +
  scale_y_continuous(name = "Modality Probability") +
  scale_color_brewer(palette = "Set1") + 
  geom_dl(method = list(dl.trans(x=x +.2), "last.qp", cex=1))

I think this is in line with our expectations for children: More frequent referents are produced most often as word, and rarer referents as gestures. It’s hard to say what’s happening with parents. There is actually some evidence for the predicted effect, but it’s weaker. Which might make sense in this context, given that both knowledge and fluency contribute for children, but presumably parents already know all of these words.

Now let’s try RTs.

modality_rt <- person_usage %>%
  group_by(n_cut, person, modality, subj) %>%
  summarise(rt = mean(rt, na.rm = T)) %>%
  group_by(n_cut, person, modality) %>%
  multi_boot_standard("rt", na.rm = T)

ggplot(modality_rt, aes(x = as.numeric(n_cut), y = mean, color = modality,
                           label = modality)) +
  facet_grid(person  ~ .) +
  geom_pointrange(aes(ymax = ci_upper, ymin = ci_lower),
                  position = position_dodge(.1)) +
  geom_line(position = position_dodge(.1)) + 
  theme_bw(base_size = 14) +
  theme(panel.grid = element_blank(), legend.position = "none",
        legend.title = element_blank(), legend.text=element_text(size=10),
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1))+
  scale_x_continuous(name = "Log Frequency Quartile", limits = c(.5, 5), 
                     breaks = seq(1,4)) +
  scale_y_continuous(name = "Median Production Time") +
  scale_color_brewer(palette = "Set1") + 
  geom_dl(method = list(dl.trans(x=x +.3), "last.qp", cex=1))

This seems right! Gesture is slower than speech and scales with frequency. Harder to see what’s happening with speech, but this might clear up as we keep adding data.

Let’s try the density representation:

modality_density <- person_usage %>%
  ungroup() %>%
  filter(modality != "both") %>%
  group_by(n_cut, person, modality, subj, referent) %>%
  summarise(n = n(),
            rt = median(rt, na.rm = T))

modality_summary <- modality_density %>%
  group_by(person, modality) %>%
  summarise(rt = median(rt, na.rm = T))

ggplot(data = modality_density, aes(x = rt, fill = modality)) +
  facet_grid(modality ~ person) + 
  geom_density(aes_string(y ="..count..")) +
  theme_bw(base_size = 16) + 
  geom_vline(aes(xintercept = rt), size = 1, lty = "dashed", data = modality_summary) + 
  theme(panel.grid = element_blank(), legend.position = "none",
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1)) +
  scale_x_continuous(limits = c(0, 1), name = "Reponse Time (seconds)") +
  scale_fill_brewer(palette = "Set1")

Analysis 2

Now let’s try a different representation and test a further hypothesis: As children learn more words, they should gesture less, and this effect should be most pronounced for low-frequency words.

So let’s aggregate data separately for each session.

person_session_usage <- coded_responses %>%
  gather(modality, value, spoken, gestured, both) %>%
  filter(value) %>%
  group_by(referent, person, subj, age, modality) %>%
  summarise(n = n(),
            rt = median(rt, na.rm = T)) %>%
  left_join(total_usage) %>%
  mutate(modality = factor(modality, levels = c("gestured", "spoken", "both")))

modality_session_usage <- person_session_usage %>%
  group_by(person, n_cut, subj, age) %>%
  mutate(prop = n/sum(n)) %>%
  group_by(person, n_cut, subj, age, modality) %>%
  summarise(prop = sum(prop)) %>%
  group_by(n_cut, person, age, modality) %>%
  multi_boot_standard("prop")

ggplot(modality_session_usage, aes(x = as.numeric(n_cut), y = mean, color = modality,
                           label = modality)) +
  facet_grid(person ~ age) +
  geom_pointrange(aes(ymax = ci_upper, ymin = ci_lower)) +
  geom_line() + 
  theme_bw(base_size = 14) +
  theme(panel.grid = element_blank(), legend.position = "none",
        legend.title = element_blank(), legend.text=element_text(size=10),
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1))+
  scale_x_continuous(name = "Log Frequency Quartile", limits = c(.5, 5), 
                     breaks = seq(1,4)) +
  scale_y_continuous(name = "Modality Probability") +
  scale_color_brewer(palette = "Set1") + 
  geom_dl(method = list(dl.trans(x=x +.2), "last.qp", cex=1))

That looks about right. Let’s try RT

modality_session_rt <- person_session_usage %>%
  group_by(n_cut, person, modality, subj, age) %>%
  summarise(rt = mean(rt, na.rm= T)) %>%
  group_by(n_cut, person, age, modality) %>%
  multi_boot_standard("rt", na.rm = T)

ggplot(modality_session_rt, aes(x = as.numeric(n_cut), y = mean, color = modality,
                           label = modality)) +
  facet_grid(person ~ age) +
  geom_pointrange(aes(ymax = ci_upper, ymin = ci_lower),
                  position = position_dodge(.1)) +
  geom_line(position = position_dodge(.1)) + 
  theme_bw(base_size = 14) +
  theme(panel.grid = element_blank(), legend.position = "none",
        legend.title = element_blank(), legend.text=element_text(size=10),
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1))+
  scale_x_continuous(name = "Log Frequency Quartile") +
  scale_y_continuous(name = "Median Production Time") +
  scale_color_brewer(palette = "Set1") + 
  geom_dl(method = list(dl.trans(x=x +.3), "smart.grid", cex=1))

Again it’s a little bit hard to say… Probably need to actually code RTs more sensibly

Analysis 3

Let’s try looking at individual words

# Subset down to only those referents found in multiple sessions
used_both_ages <- person_session_usage %>%
         filter(person == "child", modality != "both") %>%
  select(n_cut, referent, age, subj, n) %>%
  group_by(n_cut, referent, age, subj) %>%
  summarise(n = n()) %>%
  spread(age, n) %>%
  filter(!is.na(`18`) & !is.na(`22`)) %>%
  gather(age, n, `18`, `22`) %>%
  mutate(age = as.numeric(age)) %>%
  select(-n) 

# Compute proportion spoken for each referent in each session for each child
indiv_ref_props <- left_join(used_both_ages,
                              filter(person_session_usage, person == "child",
                                     modality != "both")) %>%
  select(-rt) %>%
  spread(modality, n) %>%
  mutate(spoken = ifelse(is.na(spoken), 0, as.numeric(spoken)),
         gestured = ifelse(is.na(gestured), 0, as.numeric(gestured))) %>%
  gather(modality, n, gestured, spoken) %>%
  group_by(n_cut, referent, age, subj) %>%
  mutate(prop = n/sum(n)) %>%
  group_by(modality, add = TRUE) %>%
  summarise(prop = mean(prop)) %>%
  filter(modality == "spoken") %>%
  select(-modality) %>%
  filter(n_cut != "(-0.00663,1.66]") # Too sparse

aggregate_ref_props <- indiv_ref_props %>%
  group_by(n_cut, age, referent, subj) %>%
  summarise(prop = mean(prop)) %>%
  summarise(prop = mean(prop)) %>%
  group_by(n_cut, age) %>%
  multi_boot_standard("prop")
  
ggplot(aggregate_ref_props, aes(x = as.numeric(as.factor(n_cut)), 
                                y = mean, color = as.factor(age),
                                label = as.factor(age))) +
  geom_pointrange(aes(ymax = ci_upper, ymin = ci_lower),
                  position = position_dodge(.1)) +
  geom_line(position = position_dodge(.1)) + 
  theme_bw(base_size = 14) +
  theme(panel.grid = element_blank(), legend.position = "none",
        legend.title = element_blank(), legend.text=element_text(size=10),
        axis.title.x = element_text(vjust=-.5), axis.title.y = element_text(vjust=1))+
  scale_x_continuous(name = "Log Frequency Quartile", limits = c(.5, 5), 
                     breaks = seq(1,4)) +
  scale_y_continuous(name = "Spoken Production Probability") +
  scale_color_brewer(palette = "Set1") + 
  geom_dl(method = list(dl.trans(x=x +.5), "last.qp", cex=1))