The Wisdom of Todd’s Crowd

Looking at the data
Do people get more or less accurate as they read others’ answers?
If crowds were people

We’ve known for a long time that groups can be remarkably good at producing accurate estimates, even when the average group member member is not.¹

Earlier today, a friend of mine posted a picture of a bag full of coins and asked his internet friends to guess how much money was in it. More than 100 people responded. I took it as an opportunity to test the wisdom of Todd’s crowd, to play with graphs, and to use the information to make a data-driven guess in hopes of winning a free dinner.

Looking at the data

For every place that there is output (graphs or math stuff) there is a little link to the top/right that you can click to see the code that produced it. There’s a button on the top right of the whole page that you can click to expand all of the code.

The graph below plots:²

Individual guesses (black dots)
Crowd “guesses” (turqoise stars)
The correct answer (pink line)
The distribution of guesses in the form of a density curve (purple line on the side)

One way to think about what the teal stars represent is that they are the guesses that a person would have made at each point in time if they simply averaged all of the previous guesses.

library(readr)
library(ggplot2)
library(tidyverse)
library(scales)
library(ggExtra)
library(knitr)
library(kableExtra)

setwd("/Users/joelmlevin/Desktop/toddcoinz")

#creating the data frame
guesses <- as.numeric(read_csv("~/Desktop/toddcoinz/toddguesses.csv", col_names = FALSE)$X1)
order <- rep(1:round(length(guesses),0), 1) #just creates a variable that numbers each guess
data <- as.tibble(cbind(order, guesses)) #putting the variables together

#making Lauren's guess bigger
data <- data %>%
  mutate(lauren = ifelse(guesses==444.44, 10, 2))

#a function to add density plots to each graph
addhist <- function(agraph) {
  ggMarginal(agraph, color="purple", margins = "y")
}

#this isn't necesary, but makes a fun color palette
cols <- sample(hue_pal()(round(length(guesses))))

#this is the graph without excluding outliers. 
#(ggplot(data, aes(order, guesses)) + 
  #geom_point(color=cols, size=data$lauren) +
  #geom_smooth(se = FALSE, color="pink") + #defaults to loess curve
  #geom_smooth(se = FALSE, method="lm", color="darkturquoise")) %>% #linear model 
  #addhist()

#mean(data$guesses)
#median(data$guesses)

#plus or minus two standard deviations
toprange <- mean(data$guesses) + 2 * sd(data$guesses)
bottomrange <- mean(data$guesses) - 2 * sd(data$guesses)

#filtering out guesses that are more than two standard deviations below the mean
data.2 <- data %>%
  filter(guesses>bottomrange) %>%
  mutate(lauren = ifelse(guesses==444.44, 10, 2))

#(ggplot(data.2, aes(order, guesses)) +
 #   geom_point(color=cols, size=data.2$lauren) +    
  #  geom_hline(yintercept = 447.5, color="green")) %>%
  #addhist()

mean_guess <- round(mean(data.2$guesses),2)
median_guess <- median(data.2$guesses)

data.6 <- data %>% 
  filter(guesses>bottomrange) %>% #let's say that the crowd knows when a guess is too small
  mutate(correct = 447.50,
         ind_dev = abs(guesses-correct)) 
#calculating the cumulative mean
crowd_est <- NA
for(i in 1:length(data.6$guesses)) {
  crowd_est[i] <- as.vector(mean(data.6$guesses[1:i]))
}
data.6 <- as.tibble(cbind(data.6, crowd_est))

baseplot <- ggplot(data.6, aes(order, guesses)) +
    geom_point(size=2) +
    geom_point(aes(order, crowd_est), color="darkturquoise", size=2, shape=8) +
    geom_hline(yintercept = 447.50, color="hotpink", size=1) +
    labs(x="Number of guesses", y="Guess amounts") 

ggMarginal(baseplot, color="purple", margins = "y", y=guesses)

The correct answer is: $447.50. The wisdom of the crowds estimate was $430.18 – a difference of only $17.32 or less than 4%. By contrast, the average person was off by $137.74 or more than 30%.

#performance of the crowd
correct_answer <- 447.50
crowd_wisdom <- 430.18

#absolute difference
#paste("The wisdom of the crowd was off by $", correct_answer - crowd_wisdom, " or ", round( (correct_answer-crowd_wisdom)/((correct_answer+crowd_wisdom)/2)*100,2), "%.", sep="")

#performance of individuals

#absolute difference
crowd_deviation <- 
  data.2 %>%
  mutate(deviation = abs(guesses-correct_answer)) %>%
  summarise(answer = mean(deviation)) %>%
  as.numeric

#paste("The average person was off by $", crowd_deviation, " or ", round((crowd_deviation/correct_answer)*100,2), "%.", sep="")

We can also compare accuracy between the average person and the crowd over time. Here, smaller values on the y-axis represent more accurate estimations. The crowd is much more accurate than the average person after around 10 guesses have been made.

#compaing the crowd estimate to people
data.6 <- data.6 %>% 
  select(order, correct, crowd_est, ind_dev) %>%
  mutate(crowd_dev = abs(crowd_est-correct))

#cumulative individual deviation
cummulative_ind_dev <- NA
for(i in 1:length(data.6$correct)) {
  cummulative_ind_dev[i] <- as.vector(mean(data.6$ind_dev[1:i]))
}

data.6 <- as.tibble(cbind(data.6, cummulative_ind_dev)) 
data.6 <- data.6 %>%
  mutate(performance_difference = cummulative_ind_dev-crowd_dev)

ggplot(data.6) +
    geom_point(aes(order, crowd_dev), name="Crowd accuracy", color="darkturquoise", size=4, shape=8) +
    geom_point(aes(order, cummulative_ind_dev), name="Individual accuracy (cummulative)", size=3) +
    geom_hline(yintercept = 0, color="purple") +
    labs(x="Number of guesses", y="Deviation (absolute value)") +
    annotate("text", x = c(90,90), y = c(34,140), label = c("Crowd", "Individuals"))

Finally, we can look directly at how much better the crowd is than the average person. Here, larger values mean a larger performance gap between the crowd and individuals – the same as the distance between the two trends in the previous graph.

ggplot(data.6) +
    geom_point(aes(order, performance_difference), name="Performance difference", size=3, color="green", shape=8) +
    labs(x="Number of guesses", y="Performance difference (crowd over average person)")

Do people get more or less accurate as they read others’ answers?

It’s possible that peoples’ guesses become more or less accurate over time because of their exposure to earlier guesses.

The graph below plots deviation (how wrong people are) over time. Smaller values (on the y-axis) correspond to more accurate guesses. Color represents whether a guess was above or below the correct answer (so only green dots were eligible to win per Price is Right rules.) We can see that there are a handful of dots along the bottom of the plot. Only one of them is green, which indicates a guess below the correct answer. This is Lauren’s winning guess.

In general, guesses made later tend to be less accurate. This relationship is not statistically significant at conventional levels when excluding the low outliers, but is significant when they are left in the sample. We should also note that there could be all sorts of other explanations for this. For example, people who check Facebook around the time that Todd made the post could just be better at guessing than people who checked a few hours later.

data.5 <- data %>% 
  filter(guesses>bottomrange) %>%
  mutate(deviation = abs(guesses-correct_answer),
         over.under = ifelse(guesses<=correct_answer, "Underestimated", "Overestimated")) %>%
  arrange(deviation)

ggplot(data.5, aes(order, deviation)) +
    geom_point(aes(color = factor(data.5$over.under)), size=2) +
    geom_smooth(method="lm", color="hotpink") +
    labs(y="Deviation (absolute value)", color="")

If crowds were people

We can rank the accuracy of the crowd’s widsom against the pool of 105 guessers. The winner (woo Lauren!) guessed $444.44. The crowd’s estimate of $430.18 was second best. In other words, it outperformed more than 99% of humans!

The table below shows the guesses of people who did not go over the correct answer (Price is Right rules) ranked by their accuracy.

data %>%
  select(guesses) %>%
  filter(guesses<correct_answer) %>% #price is right rules, so you can't go over...
  mutate(deviation = abs(guesses-correct_answer)) %>%
  arrange(deviation) %>%
  head() %>%
  kable("html") %>%
  kable_styling(full_width=F, position = "left", bootstrap_options = "striped")

guesses	deviation
444.44	3.06
427.50	20.00
426.10	21.40
426.00	21.50
422.00	25.50
420.69	26.81

Also see: https://www.iarpa.gov/index.php/newsroom/iarpa-in-the-news/2015/439-the-good-judgment-project ↩
All plots and analyses exclude guesses more than two standard deviations below the average, which works out to be guesses below $68. This gets rid of some guesses which were likely not intended to be accurate, like “$1”.↩