Load libraries

library(ggplot2)
library(dplyr)
library(tidyr)
library(purrr)

theme_set(theme_bw(base_size = 16))

Generate two hypothetical parents that differ in their distributions of naming event qualities.

easy <- rbeta(50, 4, 3)

hard <- rbeta(50, 2, 3)

parents <- data.frame(event = 1:50, easy = easy, hard = hard) %>%
  gather(type, size, -event)


ggplot(parents, aes(x = size)) + 
  facet_wrap(~ type) + 
  geom_histogram(bins = 10, color = "black", fill = "white") + 
  theme(panel.grid = element_blank())

parents %>%
  group_by(type) %>%
  summarise(mean = mean(size))
## # A tibble: 2 × 2
##    type  mean
##   <chr> <dbl>
## 1  easy 0.576
## 2  hard 0.428

Suppose that learning works like this: Children observe a bunch of naming events. For each one, they have a probability of learning that is equal to the the 1/100 * size of the object. If they learn from at least one event, we say they learned it–there’s no aggregation.

test <- function(vals) {
  samples <- runif(length(vals), 0, 100) < vals
  
  sum(samples) >= 1
}

Now let’s see what this looks like for the easy and hard events. Simulate each one 1000 times and see what proportion of kids learn from each parent.

test_kids <- parents %>%
  split(.$type) %>%
  map(function(df) mean(replicate(1000, test(df$size)))) %>%
  bind_rows(.$type)

test_kids
## # A tibble: 1 × 2
##    easy  hard
##   <dbl> <dbl>
## 1 0.271  0.19

Ok, so the number of kids is higher for the easier parent, as expected

Now let’s try to sub-sample the naming events using the moving threshold technique.

# Subsample to the data to be less than or greater than a threshold, or alternatively just subset to a random proportion of the trails equal to threshold
test_prop <- function(vals, thresh = 1, type = "greater") {
  if(type == "less") {
        filtered_vals <- vals[vals <= thresh]
  }
  else if(type == "greater") {
     filtered_vals <- vals[vals >= thresh]
  }
  else {
    filtered_vals <- sample(vals, thresh * length(vals))
  }
  
  mean(replicate(10000, 
                test(filtered_vals)))
}
threshes <- seq(0, 1, .1)

simulations <- expand.grid(thresh_type = c("greater", "less", "prop"),
                                parent_type = c("easy", "hard"),
                                thresh = seq(0, 1, .1)) %>%
  rowwise() %>%
  mutate(prop_learned = test_prop(filter(parents, type == parent_type)$size,
                                  thresh = thresh,
                                  type = thresh_type))
ggplot(simulations, aes(x = thresh, y = prop_learned, fill = parent_type)) + 
  facet_grid(. ~ thresh_type) +
  geom_bar(stat = "identity", position = "dodge")

diffs <- simulations %>%
  spread(parent_type, prop_learned) %>%
  mutate(diff = easy - hard)

ggplot(diffs, aes(x = thresh, y = diff)) + 
  facet_grid(. ~ thresh_type) +
  geom_point() +
  scale_x_continuous(breaks = threshes) + 
  geom_hline(aes(yintercept = 0), linetype = "dashed") + 
  theme(panel.grid = element_blank())

```