Are pollsters asking the wrong question?

TL;DR: Pollsters shouldn’t ask survey participants about their own views or preferences, but learn from the success of prediction markets by instead asking participants about the preferences or views of others. How so? It's shown through a simulation study that, on the latter approach, errors tend to be substantially lower for exactly the type of small samples that pollsters typically use -- especially under realistic conditions of biased sampling, and despite people often being mistaken about the preferences and views of others, and largely surrounding themselves with people who share their preferences. It's noted that this also offers a particularly parsimonious explanation for the relative accuracy of prediction markets.

1. The right (and the wrong) question
2. The basic idea
3. The details
4. Factoring in biased sampling
5. Factoring in egocentric bias
6. Factoring in homophily
7. Concluding remarks
References

1. The right (and the wrong) question

"...pollsters should learn from the success of prediction markets by asking participants, not about their own views or preferences, but about the views and preferences of others."

2016 was not a proud year for political opinion polling. All the more embarrassing, then, that 2020 was not massively better (Cohn (2020)), although in this case at least calling the winner correctly. This piece will not attempt anything like a post mortem -- others will be better placed to do that. Rather, it will offer -- and then substantiate empirically using a simulation study -- an observation: pollsters do poorly at least in part because they are asking the wrong question.

Specifically, the wrong question to ask is (some version of) the first-person question "How will/would you vote?", as opposed to the third-person question "How do you think people in general will vote?" Indeed, the one high-profile poll that did ask a version of the latter question -- the USC Dornsife Daybreak Poll, building on work by Galesic et al. (2018) -- did better than the vast majority of polls that asked the first type, in being less overoptimistic about a Biden win (Galesic and Bruine de Bruin (2020)). Similar results can be found in Rothschild and Wolfers (2011).

For anyone familiar with the predictive success of prediction markets for electoral outcomes (e.g., Berg and Rietz (2014); Rothschild (2009); Berg, Nelson, and Rietz (2008)), this should not come as a surprise. People betting on such markets are not asked about their own preferences; they're asked to predict how people will vote (i.e., what they prefer) in the aggregate. And while by no means infallible (see, e.g. Graefe (2017) and Ahlstrom-Vij (2016)), they tend to outperform election polls -- including in 2020 (Vaughan Williams (2020)).

Which brings us to the central message of this piece: pollsters should learn from the success of prediction markets by asking participants, not about their own views or preferences, but about the views and preferences of others. And as we shall see, this message holds true especially under realistic conditions of biased sampling, and -- notably -- even when people are to a fairly high degree mistaken about the preferences and views of others, and largely associate with people who share their preferences.

2. The basic idea

"...because of the connection between A and B, we get B's preference 'for free' when we sample A. As a result, the error of our estimate is lower on the third-person approach right out of the gate."

We'll get into the details below about why we should expect this to be the case, and why we can expect a significant factor to be that prediction markets, like good opinion polls, ask a third-person as opposed to a first-person question. But the basic idea can be illustrated fairly easily. Consider a simple case, involving three people, A, B, and C, with some preference on some binary policy matter X (blue for "for" and white for "against"):

The proportion of support for the policy in this case is 66% (or 2/3 to be exact). If we're a traditional pollster and we sample [A], then [A and B], and then [A, B, and C], in each case asking "Do you support policy X?", we'll reach the conclusion that the level of support is 100%, 50%, and 66%, respectively. This all makes sense given established polling practices of preferring larger to smaller samples, and of using sampling and weighting techniques that render their samples more representative of the population, since even large samples won't come close to encompassing the whole population.

That's all well-trodden territory. But now consider the edge between A and B in the above plot. Let that one designate that A knows B's preference on policy X. And let's imagine now that we ask, not the traditional first-person question, but the third-person question of each person sampled: "What proportion of people are supportive of policy X?" If we again sample [A], then [A and B], and finally, [A, B, and C], we get the conclusion that the level of support is 50%, 25%, and 50%, respectively (assuming that A averages between her and B's preference).

There are two things to note here, that we'll dig more deeply into below:

First, because of the connection between A and B, we get B's preference "for free" when we sample A. As a result, the error of our estimate is lower on the third-person approach right out of the gate: 16 percentage points off, compared to 34 on the first-person approach. As we shall see, this is not some quirk of this particular example, but a feature of the third-person approach: where people know others' preferences, the third-person approach consistently outperforms the first-person approach in small samples (Section 3). This will turn out to be especially so in cases where random sampling fails (Section 4), and despite people being fallible about other people's preferences (Section 5), and largely surround themselves with people who share their preferences (Section 6).

Second, while on the first person-approach we can trust that sampling the entire population will guarantee an accurate estimate, this is interestingly not the case on the third-person approach. It would have been the case, had all "connections"" between people been bi-directional -- that is, if my knowing your preferences invariably meant you knowing mine. (Try it yourself in the case above.) But assuming such bi-directionality would be implausible, so it won't be assumed in what follows. And as we shall see below (Section 4 and onwards), when random sampling fails, the third-person approach still remains the most accurate method virtually up to the point where it's an option to sample the entire population -- which it rarely is.

3. The details

"...the third-person approach outperforms the first-person approach in small samples, and more decisively so the higher the level of connectedness."

But let's get into the details. Specifically, let's start by looking at the first feature mentioned above, i.e., that the third-person approach outperforms the first-person approach in small samples. What we'll want to make sure is that this being so in the simple case above didn't have anything to do with the particular proportion of support in that case. We'll also want to see to what extent it holds across both sample size and the degree of connectedness between people.

To that end, let's run a simulation. The below R code does the following:

It generates 500 populations (set by iterations) of 1,000 persons each (set by n).
For each population, it randomly defines some distribution of support (set by prop) for an imagined policy (1 = for; 0 = against), in the range of 0-100%.
For each person in a population, it randomly finds some pre-specified (by num_neighbours) number of "neighbours", where each such person knows their own preference as well as that of their neighbours.
For each of the 500 populations, and for each sample size between 1% and 100%, it then draws 100 random samples (number_of_samples), and looks at both the mean level of support in that sample, and the mean perceived level of support in that sample. The former corresponds to asking each person sampled whether they are for or against the policy (and taking the mean response); the latter corresponds to asking each person whether they think the population supports that policy (and, again, taking the mean response). This mirrors, of course, to the first- and the third-person approach, respectively.
Finally, it measures the mean absolute error in percentage points across those 100 samples, for each of the 100 sample sizes (i.e., 1% to 100%), and then compares the average error for each sample size (saved in means) for purposes of comparing the first- and third-person approach.

The above steps are then repeated for 1, 2, 5, 10, 50, and 100 neighbours, corresponding to each person knowing the preference of 0.1% to 1% of the population. In the snippet below, I've only included the code for num_neighbours <- 1 for the sake of brevity. The full script can be downloaded here. (If you are looking to run the code, note that it might take a while, depending on your machine.)

n <- 1000
iterations <- 500
number_of_samples <- 100

master_first_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))
master_third_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))

set.seed(101)
num_neighbours <- 1
for (i in 1:iterations) {
    prop <- sample(1:100, 1)/100
    df <- data.frame(pref = sample(0:1, size = n, replace = TRUE, 
        prob = c(prop, 1 - prop)))
    df$est <- NA
    
    for (j in 1:length(df$pref)) {
        neighbours <- df[sample(nrow(df[-j, ]), num_neighbours, 
            replace = FALSE), ]
        neighbours <- rbind(neighbours, df[j, ])
        df$est[j] <- mean(neighbours$pref)
    }
    
    accuracy_df <- data.frame(sample_size = 1:100, first_person_est = NA)
    accuracy_df$sample_size <- accuracy_df$sample_size/100
    
    for (k in 1:length(accuracy_df$sample_size)) {
        temp_standard <- NA
        temp_third_person <- NA
        for (l in 1:number_of_samples) {
            my_sample <- df[sample(nrow(df), length(df$pref) * 
                accuracy_df$sample_size[k], replace = FALSE), 
                ]
            temp_standard <- append(temp_standard, mean(my_sample$pref, 
                na.rm = TRUE))
            temp_third_person <- append(temp_third_person, mean(my_sample$est, 
                na.rm = TRUE))
        }
        accuracy_df$first_person_est[k] <- mean(temp_standard, 
            na.rm = TRUE)
        accuracy_df$third_person_est[k] <- mean(temp_third_person, 
            na.rm = TRUE)
    }
    
    true_prop <- mean(df$pref)
    master_first_person[, i] <- abs(accuracy_df$first_person_est - 
        true_prop)
    master_third_person[, i] <- abs(accuracy_df$third_person_est - 
        true_prop)
}

means <- data.frame(sample_size = (1:100)/100, first_person_acc = rowMeans(master_first_person), 
    third_person_acc_1 = rowMeans(master_third_person))

...

If we then plot the average error across the 100 samples for each sample size, we see that the error of the third-person approach compares to the first-person approach as follows:

We see here what we observed already in the simple, three-person case introduced at the outset (in Section 2): the third-person approach outperforms the first-person approach in small samples, and more decisively so the higher the level of connectedness; and sampling the entire population does not tend to remove all error on the third-person approach. On the point about small samples, it helps to zoom in a bit, by looking specifically at samples sizes in the range of 1-20%:

Even such small degrees of connectedness as each person having two "neighbours" has the third-person approach outperform the first-person approach up to samples of about 2.5% of the population; the same goes for samples of about 6% for five "neighbours", about 12% for 10 "neighbours", and (zooming back out to the full graph above) about 37% for 50 "neighbours" and 50% for 100 "neighbours".

That bodes well for the third-person approach, given that we are rarely able to achieve large samples in polling contexts. At the same time, this simulation assumes random sampling -- an assumption that's not particularly realistic in light of selection bias (i.e., some groups of potential respondents being more likely to be sampled than others), nonresponse bias (i.e., some types of sampled respondents being more likely to respond than others), and the like. So, in the next section, we relax this assumption to see what then happens to the comparative merits of the two approaches.

4. Factoring in biased sampling

"...under the more realistic assumption of biased as opposed to random sampling, the third-person approach clearly outperforms the first-person approach, and it does so almost all the way up to samples encompassing the entire population."

As noted, we are rarely able to achieve perfectly random sampling -- some participants will inevitably be more likely than others to end up being represented in the sample, e.g., on account of selection or nonresponse bias. Let us factor that into our simulation, by adding the following to our simulation script:

Prior to sampling from a population, we randomly define some degree of sampling bias, in the range of 0-99%, such that every person not supporting the policy (pref==0) has a probability of bias of being sampled, and everyone supporting it (pref==1) having a probability of 1-bias of being sampled.

Beyond that, the code remains as above. Here, too, I'm only including the code for num_neighbours <- 1 for brevity. The full script can be downloaded here.

n <- 1000
iterations <- 500
number_of_samples <- 100

master_first_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))
master_third_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))

set.seed(101)
num_neighbours <- 1
for (i in 1:iterations) {
    prop <- sample(1:100, 1)/100
    df <- data.frame(pref = sample(0:1, size = n, replace = TRUE, 
        prob = c(prop, 1 - prop)))
    df$est <- NA
    
    for (j in 1:length(df$pref)) {
        neighbours <- df[sample(nrow(df[-j, ]), num_neighbours, 
            replace = FALSE), ]
        neighbours <- rbind(neighbours, df[j, ])
        df$est[j] <- mean(neighbours$pref)
    }
    
    accuracy_df <- data.frame(sample_size = 1:100, first_person_est = NA, 
        third_person_est = NA)
    accuracy_df$sample_size <- accuracy_df$sample_size/100
    
    for (k in 1:length(accuracy_df$sample_size)) {
        temp_standard <- NA
        temp_third_person <- NA
        bias <- sample(1:99, 1, replace = TRUE)/100
        df$bias <- NA
        df$bias[df$pref == 0] <- bias
        df$bias[df$pref == 1] <- 1 - bias
        for (l in 1:number_of_samples) {
            biased_sample <- df[sample(nrow(df), length(df$pref) * 
                accuracy_df$sample_size[k], replace = FALSE, 
                prob = df$bias), ]
            temp_standard <- append(temp_standard, mean(biased_sample$pref, 
                na.rm = TRUE))
            temp_third_person <- append(temp_third_person, mean(biased_sample$est, 
                na.rm = TRUE))
        }
        accuracy_df$first_person_est[k] <- mean(temp_standard, 
            na.rm = TRUE)
        accuracy_df$third_person_est[k] <- mean(temp_third_person, 
            na.rm = TRUE)
    }
    
    true_prop <- mean(df$pref)
    master_first_person[, i] <- abs(accuracy_df$first_person_est - 
        true_prop)
    master_third_person[, i] <- abs(accuracy_df$third_person_est - 
        true_prop)
}

means <- data.frame(sample_size = (1:100)/100, first_person_acc = rowMeans(master_first_person), 
    third_person_acc_1 = rowMeans(master_third_person))

...

Let's then compare the levels of accuracy between the first-person and the third-person approach:

Look at small sample sizes first. Even for cases where each person only has a single "neighbour", the third-person approach cuts the error of the first-person approach in half for very small samples. More generally, we see that, under the more realistic assumption of biased as opposed to random sampling, the third-person approach clearly outperforms the first-person approach, and it does so almost all the way up to samples encompassing the entire population. This can be seen more clearly in the graph below, which zooms in at the range of a 90-100% sample:

5. Factoring in egocentric bias

"...even widespread and substantial egocentric bias doesn't undo the advantage that the third-person approach has over the first-person approach under circumstances of biased sampling."

It might be objected that what's driving the results in the previous section is the (so far implicit) assumption that each person has perfect knowledge of the preferences of their "neighbours". This is, of course, yet another idealisation: in reality, we tend to exhibit an egocentric bias (L. Ross, Greene, and House 1977) in identifying others’ preferences, meaning that we tend to assume that others prefer more or less what we do -- a phenomenon that has also been observed on prediction markets (Forsythe, Rietz, and Ross 1999). Indeed, the tendency to project our own states onto others is not restricted to preferences, but also include our knowledge (Bernstein et al. 2004), and even our thirst (Van Boven and Loewenstein 2003).

Let us factor that into our simulation. Specifically, let us add the following to our code above:

Each person will ascribe to each of their neighbours the correct preference with a probability of 51% (their level of discernment), and otherwise simply ascribe to them their own preference.

In other words, we are simulating a very high degree of egocentric bias, where each participant barely beats chance when it comes to correctly identifying the preferences of others. Beyond that, the code remains as above. The full script can be downloaded here.

n <- 1000
iterations <- 500
number_of_samples <- 100
discernment <- 0.51

master_first_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))
master_third_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))

set.seed(101)
num_neighbours <- 1
for (i in 1:iterations) {
    prop <- sample(1:100, 1)/100
    df <- data.frame(pref = sample(0:1, size = n, replace = TRUE, 
        prob = c(prop, 1 - prop)))
    df$est <- NA
    
    for (j in 1:length(df$pref)) {
        neighbours <- df[sample(nrow(df[-j, ]), num_neighbours, 
            replace = FALSE), ]
        neighbours$perceived_pref <- NA
        for (m in 1:length(neighbours$pref)) {
            neighbours$perceived_pref[m] <- sample(c(neighbours$pref[m], 
                df$pref[j]), 1, prob = c(discernment, 1 - discernment))
        }
        neighbours <- rbind(neighbours, cbind(df[j, ], perceived_pref = df[j, 
            ]$pref))
        df$est[j] <- mean(neighbours$perceived_pref)
    }
    
    accuracy_df <- data.frame(sample_size = 1:100, first_person_est = NA, 
        third_person_est = NA)
    accuracy_df$sample_size <- accuracy_df$sample_size/100
    
    for (k in 1:length(accuracy_df$sample_size)) {
        temp_standard <- NA
        temp_third_person <- NA
        bias <- sample(1:99, 1, replace = TRUE)/100
        df$bias <- NA
        df$bias[df$pref == 0] <- bias
        df$bias[df$pref == 1] <- 1 - bias
        for (l in 1:number_of_samples) {
            biased_sample <- df[sample(nrow(df), length(df$pref) * 
                accuracy_df$sample_size[k], replace = FALSE, 
                prob = df$bias), ]
            temp_standard <- append(temp_standard, mean(biased_sample$pref, 
                na.rm = TRUE))
            temp_third_person <- append(temp_third_person, mean(biased_sample$est, 
                na.rm = TRUE))
        }
        accuracy_df$first_person_est[k] <- mean(temp_standard, 
            na.rm = TRUE)
        accuracy_df$third_person_est[k] <- mean(temp_third_person, 
            na.rm = TRUE)
    }
    
    true_prop <- mean(df$pref)
    master_first_person[, i] <- abs(accuracy_df$first_person_est - 
        true_prop)
    master_third_person[, i] <- abs(accuracy_df$third_person_est - 
        true_prop)
}

means <- data.frame(sample_size = (1:100)/100, first_person_acc = rowMeans(master_first_person), 
    third_person_acc_1 = rowMeans(master_third_person))

...

Let's have a look at the impact of this high degree of egocentric bias on the comparative merits of the first- and third-person approach:

As is to be expected, the degree of error goes up for the third-person approach under these circumstances. But that error is still lower than for the first-person approach, suggesting that even widespread and substantial egocentric bias doesn't undo the advantage that the third-person approach has over the first-person approach under circumstances of biased sampling.

6. Factoring in homophily

"...the fact that the third-person approach does not 'break' under these assumptions is really rather noteworthy, and speaks to the aggregate power of even very low levels of social connection."

It is well-known that people tend to socialise and affiliate themselves with people who are like them -- as the saying goes, "birds of a feather flock together". In people, such flocking tends to happen around demographic factors like gender, race, ethnicity, religion, education, social class, and occupation (see McPherson, Smith-Lovin, and Cook (2001) for a classic overview), as well as social characteristics like political affiliation (Ackland and Shorish (2014)). Indeed, people even date (Huber and Malhotra (2017)) and mate (Alford et al. (2011)) in political clusters.

This has implications for the models above, since they sample neighbours randomly from the population. If people cluster together in ways that are not random -- as the fact of homophily suggests -- this is too much of a simplification. So, let's add the following to our simulation script:

Define a bubble, designating the proportion of like-minded neighbours (i.e., neighbours of the same preference) that each person will tend to have. So, if bubble <- 1, for example, then each person only interacts with like-minded people.
When sampling neighbours for any given person, sample like-minded ones with a probability of bubble and not like-minded ones with a probability of 1-bubble.

To get a feel for the impact of different levels of homophily on the errors of the first- and third-person approach, we run our simulation for bubble values of 99%, 90%, 75%, and 50%. (50% corresponds to no homophily; bubble values below 50% would designate heterophily, whereby 'opposites attract'.) Below, the code for bubble <- 0.99 and num_neighbours <- 1 is included, for brevity. The full script can be downloaded here.

n <- 1000
iterations <- 500
number_of_samples <- 100
discernment <- 0.51
bubble <- 0.99

master_first_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))
master_third_person <- data.frame(matrix(NA, nrow = 100, ncol = iterations))

set.seed(101)
num_neighbours <- 1
for (i in 1:iterations) {
    prop <- sample(1:100, 1)/100
    df <- data.frame(pref = sample(0:1, size = n, replace = TRUE, 
        prob = c(prop, 1 - prop)))
    df$est <- NA
    
    for (j in 1:length(df$pref)) {
        pref_of_j <- df$pref[j]
        df$prob[df$pref == pref_of_j] <- bubble
        df$prob[df$pref != pref_of_j] <- 1 - bubble
        neighbours_indices <- sample(c(rownames(df[df$pref == 
            pref_of_j, ]), rownames(df[df$pref != pref_of_j, 
            ])), num_neighbours, replace = FALSE, prob = c(df$prob[df$pref == 
            pref_of_j], df$prob[df$pref != pref_of_j]))
        
        neighbours <- df[neighbours_indices, ]
        
        neighbours$perceived_pref <- NA
        for (m in 1:length(neighbours$pref)) {
            neighbours$perceived_pref[m] <- sample(c(neighbours$pref[m], 
                df$pref[j]), 1, prob = c(discernment, 1 - discernment))
        }
        neighbours <- rbind(neighbours, cbind(df[j, ], perceived_pref = df[j, 
            ]$pref))
        df$est[j] <- mean(neighbours$perceived_pref)
    }
    
    accuracy_df <- data.frame(sample_size = 1:100, first_person_est = NA, 
        third_person_est = NA)
    accuracy_df$sample_size <- accuracy_df$sample_size/100
    
    for (k in 1:length(accuracy_df$sample_size)) {
        temp_standard <- NA
        temp_third_person <- NA
        bias <- sample(1:99, 1, replace = TRUE)/100
        df$bias <- NA
        df$bias[df$pref == 0] <- bias
        df$bias[df$pref == 1] <- 1 - bias
        for (l in 1:number_of_samples) {
            biased_sample <- df[sample(nrow(df), length(df$pref) * 
                accuracy_df$sample_size[k], replace = FALSE, 
                prob = df$bias), ]
            temp_standard <- append(temp_standard, mean(biased_sample$pref, 
                na.rm = TRUE))
            temp_third_person <- append(temp_third_person, mean(biased_sample$est, 
                na.rm = TRUE))
        }
        accuracy_df$first_person_est[k] <- mean(temp_standard, 
            na.rm = TRUE)
        accuracy_df$third_person_est[k] <- mean(temp_third_person, 
            na.rm = TRUE)
    }
    
    true_prop <- mean(df$pref)
    master_first_person[, i] <- abs(accuracy_df$first_person_est - 
        true_prop)
    master_third_person[, i] <- abs(accuracy_df$third_person_est - 
        true_prop)
}

means <- data.frame(sample_size = (1:100)/100, first_person_acc = rowMeans(master_first_person), 
    third_person_acc_1 = rowMeans(master_third_person))

...

Let's have a look at the impact of these four different levels of homophily on the comparative merits of the first- and third-person approach:

Looking at the top-left panel first, we see that the difference between the first- and third-person approach disappears once homophily reaches 99% -- i.e., at the point where there is virtually no engagement whatsoever between persons that hold different preferences. This makes sense, of course, because it would be at exactly that point that any edge that the third-person approach has in terms of tapping into the connections between people of different preferences vanishes.

In terms of the other three panels, we see -- as is to be expected -- that lower levels of homophily mean a greater reduction in error for the third-person approach, compared to the first-person approach, with 50% homophily (i.e., no homophily, since people are equally likely to associate with like-minded as with not like-minded people) being identical to the results in the previous section.

It is worth looking at the top-right panel in particular, with homophily at 90%, and to zoom in at small sample sizes (0-20%).

What we see here is that, despite (a) agents barely beating chance in estimating the preferences of others, on account of a strong egocentric bias, and (b) their social circles (i.e., their neighbours) being almost exclusively people who share their preferences owing to a very high level of homophily -- two facts that, in combination, will more or less ensure that people will estimate their neighbours' preferences to be identical to their own -- the third-person approach still outperforms by a small margin the first-person approach, even in cases where people only have one neighbour.

It is worth noting exactly how 'hostile' these modeling assumptions are to the third-person approach. Generally, the more hostile the assumptions made in modeling, the more robust we can expect any results to be, since it reduces the chance that they are a mere artifact of conveniently set parameter values. And in our particular case, the fact that the third-person approach does not 'break' under these assumptions is really rather noteworthy, and speaks to the aggregate power of even very low levels of social connection.

7. Concluding remarks

"...the above simulations raise the intriguing possibility that, at least when it comes to prediction markets tending to outperform polls, one important explanatory factor is simply that prediction market makers in effect ask participants the 'right' question."

If the above is on point, pollsters would do well not not to ask survey participants about their own views or preferences, but learn from the success of prediction markets by instead asking participants about the preferences or views of other. Indeed, as the simulations above illustrate, on the latter approach, errors tend to be substantially lower for exactly the type of small sample sizes that pollsters would typically use. Interestingly, this holds true especially under realistic conditions of biased sampling, and even if people barely beat chance when it comes to our ability to correctly identify the preferences (or views) of others, and largely surround themselves with people who share their preferences to begin with.

It is worth noting that this also offers a particularly parsimonious explanation of the accuracy of prediction market estimates (see also Rothschild and Wolfers (2011)). Note three things in particular:

It is tempting to simply point to financial incentives as the main driver of accuracy on such markets. However, the accuracy difference between play- and real-money markets turns out to be either non-existent (Servan‐Schreiber et al. 2004) or small and context dependent (Mchugh and Jackson 2012).
Relatedly, the particular way in which prediction markets are typically resolved -- by rewarding participants with explicit reference to some external outcome (e.g., of an election ) -- doesn’t seem to explain their accuracy, as self-resolving markets seem to perform equally well (Ahlstrom-Vij 2019).
Moreover, standard "wisdom of crowds" accounts (e.g., Surowiecki (2005)) fail to explain the full range of scenarios in which prediction markets generate accurate outputs (see, e.g., Ahlstrom-Vij (2016) and Hanson (2013)).

In this context, the above simulations raise the intriguing possibility that, at least when it comes to prediction markets tending to outperform polls, one important explanatory factor is simply that prediction market makers in effect ask participants the "right" question.

References

Ackland, Robert, and Jamsheed Shorish. 2014. “Political Homophily on the Web.” In Analyzing Social Media Data and Web Networks, edited by Marta Cantijoch, Rachel Gibson, and Stephen Ward, 25–46. London: Palgrave Macmillan UK. doi:10.1057/9781137276773_2.

Ahlstrom-Vij, Kristoffer. 2016. “Information Markets.” In A Companion to Applied Philosophy, 89–102. John Wiley & Sons, Ltd. doi:https://doi.org/10.1002/9781118869109.ch7.

———. 2019. “Self-Resolving Information Markets: A Comparative Study.” Journal of Prediction Markets, February. doi:10.5750/jpm.v13i1.1687.

Alford, John R., Peter K. Hatemi, John R. Hibbing, Nicholas G. Martin, and Lindon J. Eaves. 2011. “The Politics of Mate Choice.” The Journal of Politics 73 (2): 362–79. doi:10.1017/S0022381611000016.

Berg, Joyce E., and Thomas A. Rietz. 2014. “Market Design, Manipulation, and Accuracy in Political Prediction Markets: Lessons from the Iowa Electronic Markets.” PS: Political Science and Politics 47 (2). Cambridge University Press: 293–96. doi:10.1017/S1049096514000043.

Berg, Joyce E., Forrest D. Nelson, and Thomas A. Rietz. 2008. “Prediction Market Accuracy in the Long Run.” International Journal of Forecasting 24 (2): 285–300. https://EconPapers.repec.org/RePEc:eee:intfor:v:24:y:2008:i:2:p:285-300.

Bernstein, Daniel M, Cristina Atance, Geoffrey R Loftus, and Andrew Meltzoff. 2004. “We Saw It All Along: Visual Hindsight Bias in Children and Adults.” Psychol Sci 15 (4): 264–67. doi:10.1111/j.0963-7214.2004.00663.x.

Cohn, Nate. 2020. “What Went Wrong with Polling? Some Early Theories.” New York Times, November. https://www.nytimes.com/2020/11/10/upshot/polls-what-went-wrong.html.

Forsythe, Robert, Thomas A Rietz, and Thomas W Ross. 1999. “Wishes, Expectations and Actions: A Survey on Price Formation in Election Stock Markets.” Journal of Economic Behavior & Organization 39 (1): 83–110. doi:https://doi.org/10.1016/S0167-2681(99)00027-X.

Galesic, M., and W. Bruine de Bruin. 2020. “Election Polls Are More Accurate If They Ask Participants How Others Will Vote.” The Conversation, November. https://theconversation.com/election-polls-are-more-accurate-if-they-ask-participants-how-others-will-vote-150121.

Galesic, M., Bruine de Bruin W., Dumas M., Kapteyn A., Darling J. E., and E. Meijer. 2018. “Asking About Social Circles Improves Election Predictions.” Nature Human Behaviour 2 (3): 187–93. doi:10.1038/s41562-018-0302-y.

Graefe, Andreas. 2017. “Prediction Market Performance in the 2016 U.S. Presidential Election.” Foresight: The International Journal of Applied Forecasting, no. 45 (Spring): 38–42. https://ideas.repec.org/a/for/ijafaa/y2017i45p38-42.html.

Hanson, Robin. 2013. “Shall We Vote on Values, but Bet on Beliefs?” Journal of Political Philosophy 21 (2): 151–78. doi:https://doi.org/10.1111/jopp.12008.

Huber, Gregory A., and Neil Malhotra. 2017. “Political Homophily in Social Relationships: Evidence from Online Dating Behavior.” The Journal of Politics 79 (1): 269–83. doi:10.1086/687533.

Mchugh, Patrick, and Aaron Jackson. 2012. “Prediction Market Accuracy: The Impact of Size, Incentives, Context and Interpretation.” Journal of Prediction Markets 6 (2): 22–46. https://EconPapers.repec.org/RePEc:buc:jpredm:v:6:y:2012:i:2:p:22-46.

McPherson, M., L. Smith-Lovin, and J. M. Cook. 2001. “Birds of a Feather: Homophily in Social Networks.” Review of Sociology 27: 415–44.

Ross, Lee, David Greene, and Pamela House. 1977. “The ’False Consensus Effect’: An Egocentric Bias in Social Perception and Attribution Processes.” Journal of Experimental Social Psychology 13 (3): 279–301. doi:https://doi.org/10.1016/0022-1031(77)90049-X.

Rothschild, David. 2009. “Forecasting Elections: Comparing Prediction Markets, Polls, and Their Biases.” The Public Opinion Quarterly 73 (5): 895–916. http://www.jstor.org/stable/40467652.

Rothschild, David, and Justin Wolfers. 2011. “Forecasting Elections: Voter Intentions Versus Expectations.” SSRN Electronic Journal, July. doi:10.2139/ssrn.1884644.

Servan‐Schreiber, Emile, Justin Wolfers, David M. Pennock, and Brian Galebach. 2004. “Prediction Markets: Does Money Matter?” Electronic Markets 14 (3). Routledge: 243–51. doi:10.1080/1019678042000245254.

Surowiecki, James. 2005. The Wisdom of Crowds. New York: Anchor Books.

Van Boven, Leaf, and George Loewenstein. 2003. “Social Projection of Transient Drive States.” Pers Soc Psychol Bull 29 (9): 1159–68. doi:10.1177/0146167203254597.

Vaughan Williams, Leighton. 2020. “Joe Biden: How Betting Markets Foresaw the Result of the 2020 Us Election.” The Conversation, November. https://theconversation.com/joe-biden-how-betting-markets-foresaw-the-result-of-the-2020-us-election-150095.