In the first section I construct a result, and in the second I destroy it.
Our questionaire contained 12 items from the juror bias scale, each of which asked for a response between 1 and 6. A respondant’s score is the sum of the items, with higher scores meaning more biased towards the prosecution. Looking at the distribution of scores, we see that our subjects are by and large below the midpoint of the scale (which is 42) and we have no one close to the maximum score (which is 72):
For all subsequent analyses we will use the z-scored juror bias rather than the raw score.
The obvious first question is, does higher bias predict higher case strength ratings and/or higher probabilities of voting guilty? Comparing each respondant’s average ratings/vote rates to their juror bias, we do indeed see a modest positive relationship between them:
(for completeness, the correlation with rating is significant while that with voting is not, but please do not take that seriously)
So far so unsurprising. However, can we figure out more precisely what is driving this relationship? Is juror bias associated with different baselines, evidence effects, or both? As a first pass we’ll group respondants according a trinary split on the bias scale, and in each level of bias look at average ratings/votes as a function of the balance of inculpatory vs exculpatory evidence:
This is where things get more interesting. When the evidence balance is exculpatory, higher juror bias predicts harsher judgements against the defendant, as we’d expect. However, when the evidence is on balance inculpatory, higher juror bias predicts judgments more favorable for the defendant.
We can go a step farther and see how juror bias effects the weighting if individual classes of evidence by adding to our existing models interaction effects between evidence class and juror bias. We can visualize this in terms of the evidence weights predicted for a subject with high versus low juror bias, where high and low here mean one standard deviation above or below the mean, respectively:
We see that high bias is predicts consistently lower weights on inculpatory evidence, but – at least in voting responses – consistently higher weight on exculpatory evidence, the net effect being lower absolute weights on both kinds of evidence. This general blunting of evidence weights would naturally lead to the weaker relationship between evidence balance and judgements we observed in the high bias group.
Note that not all, or even most, of the individual interactions between bias and evidence are “significant”; more achieve that rarified status in the voting model than in the rating model, where the interactions with exculpatory evidence in particular are a bit muddled, but I think the plot of effect sizes above makes clear that we shouldn’t draw any strong conclusion from this observation.
It seems that the juror bias scale is picking up on some aspect of covariation among the evidence weights. This suggests a question we have not yet addressed, which is, what is the pattern of covariation among the weights which subjects assign to different pieces of evidence? We know that subjects do seem to vary nontrivially in their evidence weights, but does that variance fall along any particular dimensions of, so to speak, juror “personality”? And how does juror bias relate to those dimensions? We can get at this question using that greatest and oldest friend of personality psychology, the factor analysis. Our particular implementaion will be slightly fancier than normal, as we will build a factor model directly into the distribution of the per-subject random effects (plus juror bias) of our existing models, rather than estimating the per-subject evidence weights from one model and then post-hoc fitting a separate factor model with those estimates.
Fitting the model while assuming only a single factor, we find that said factor has these loadings:
Each point is an individual loading of the factor. This mostly recapitulates what we’ve seen before, which is that higher juror bias is associated with lower exculpatory evidence weighting and higher inculpatory evidence weighting. This one factor explains about 30% (CI: [20%, 40%]ish) of the total variance in subjects’ evidence weightings. So, there’s still a substantial amount of idiosyncratic variance in evidence weighting left over.
If we expand to two factors, we find the second factor loads thusly:
In other words, there is no second factor. Note that the juror bias and baseline loadings are constrained to be zero and positive, respectively, in order to enforce identifiability between factors. Whatever consistent structure there is in the pattern of how people weight evidence, it appears to be well indexed by the juror bias scale.
So this is all very interesting. However, as I type this the grim spectre of triviality has come to haunt my hypothesizing: blunted evidence weights could very well reflect disinterest in or inattention to the task rather than any kind of interesting psychological phenomenon, indicating only that some participants responded more randomly than others. This explanation can account for the relationship between evidence weighting and the juror bias scale because, as seen at the top of this report, the distribution of bias scores is centered well below the midpoint of the scale. Therefore, being on the higher end of the scale implies you are closer to the score expected under random responding as well as predicting blunted evidence weights. I do not currently know how we might rule out this null hypothesis.
We can crudely measure how random a subject’s ratings are with respect to the evidence by regressing their ratings against the evidence (in the manner we have been doing) and calcualating their coefficient of determination, \(R^2\), which here acts as a (normalized) measure of the evidence that they are using the evidnece versus ignoring it. Having done this, we can compare it to the juror bias scores:
Indeed, we see a modest relationship whereby lower determination by evidence is associated with being higher on the juror bias scale – that is, in our particular sample, around the midpoint of the scale. This relationship seems to be driven by a smallish cluster with \(R^2\)s below about 0.4. If we simply chop those subjects off and rerun some of the previous visualizations, we obtain the following:
If we rerun the factor model on the binary responses with this pruned sub-population, we obtain the following factor loadings:
That is to say, everything goes away when those eight subjects are removed.
To conclude, I can do no better than these words of the immortal bard:
Out, out, spurious finding!
Science’s but a walking shadow, a poor analysis,
that struts and frets its issue of the journal,
and then is cited no more. It is a result
reported by an idiot, full of uncorrected p-values,
signifying nothing.