Let’s start by viewing the distributions of case strength ratings as a function of guilt judgements:
As you might hope, users almost never respond that they would vote to find a defendant guilty (legal guilt) unless they also say that the believe the subject is guilty (subjective guilt) (indicated by the very small amount of green). Subjective and legal judgements almost always agree (indicated by the relatively small amount of blue), which is somewhat disturbing if you think that the standard of reasonable doubt should be substantially higher than that of subjective belief.
We can plot something like a population-level psychometric function for each kind of guilt judgement as a function of case strength, like so:
Here it looks like asking about legal versus subjective guilt has two effects: first, it shifts the curve to the right, and second (and more subtley) it seems to sharpen the curve. We see the same effects if we do this a bit more rigorously and fit a proper mixed-effects logistic regression:
glmer(value ~ 1 + rating*variable + (1 + rating*as.numeric(I(variable=="legalguilt")) || uid),
family = binomial,
data = melt(dat,id.vars = c("rating","uid"),
measure.vars = c("subjguilt","legalguilt"))[, rating:=rating/100]) %>%
summary()
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## value ~ 1 + rating * variable + (1 + rating * as.numeric(I(variable ==
## "legalguilt")) || uid)
## Data: melt(dat, id.vars = c("rating", "uid"), measure.vars = c("subjguilt",
## "legalguilt"))[, `:=`(rating, rating/100)]
##
## AIC BIC logLik deviance df.resid
## 2777 2829 -1380 2761 5016
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -13.863 -0.228 -0.017 0.173 9.954
##
## Random effects:
## Groups Name Variance Std.Dev.
## uid (Intercept) 1.937785 1.3920
## uid.1 rating 8.293724 2.8799
## uid.2 as.numeric(I(variable == "legalguilt")) 2.046765 1.4307
## uid.3 rating:as.numeric(I(variable == "legalguilt")) 0.000243 0.0156
## Number of obs: 5024, groups: uid, 81
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.936 0.244 -16.14 < 2e-16 ***
## rating 11.790 0.621 18.98 < 2e-16 ***
## variablelegalguilt -3.537 0.383 -9.24 < 2e-16 ***
## rating:variablelegalguilt 3.175 0.702 4.52 6.1e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) rating vrbllg
## rating -0.596
## varbllglglt -0.214 0.160
## rtng:vrbllg 0.335 -0.379 -0.832
## convergence code: 0
## Model failed to converge with max|grad| = 0.0123404 (tol = 0.001, component 1)
Here “rating” is the effect of going from 0 to 100 on the case strength scale, while the “variablelegalguilt” is the difference between legal judgements and subjective judgements. The interaction effect “rating:variablelegalguilt” agrees with our (or at least, my) visual impression that case strength has a bigger impact on legal than subjective judgements.
The shift of the curve has a clear interpretation – legal guilt requires a stronger case than subjective guilt. The sharpening of the curve when going to subjective to legal judgements is a bit more ambiguous. One interpretation is that with subjective guilt, people are letting factors other than case strength influence their judgement, whereas they are focusing more specifically on the case strength when judging legal guilt.
Next we’ll look at the relationships among the legal severity of crimes, user ratings of crime punishability, and case strength. To examine the two response scales jointly, as well as the correlations between them, we’ll use a multivariate linear mixed effects model. Note that this model treats responses are purely guassian, ignoring the bounded response scales and how people are using those scales. Accordingly, predictions from this model won’t match the observed data and exact numbers for effect sizes and credible intervals should perhaps not be taken too seriously. However, for questions of the form, “do these two things have a positive or negative relationship” this simple model should be adequate.
Note that here “severity” means simply, which number each crime is in the original spreadsheet. The spreadsheet also has a column called “severity” which contains single letters or letter-number combinations which I assume correspond to a formal legal classification, but I wasn’t sure what to do with those.
Crime severity is, unsurpringnly, strongly predictive of how severely people think a crime shoulb be punished. Different crimes do also appear to have different case strength baselines, but the magnitude of these differences are much smaller than for punishment ratings and there is no systematic relationship between the case strength of a crime and the crime’s severity.
This all seems very sensible, but on the other hand – haven’t we found that in general case strength ratings and punishment ratings are correlated, at the level of individual observation? Do we still observe this in the present data set? And if so, what accounts for this correlation, if not crime severity?
In the raw data we do see a correlation of about 0.3 between the two ratings. However, the relationship doesn’t seem to be linear. It looks like people might be thinking something like, if a crime isn’t punishable then cases are necessarily weak, whereas for a very punishable crime a case can be either strong or weak. This doesn’t particularly help us explain what, other than crime severity, is driving this correlation.
To try to get a little more insight, we can refit the model while incorporating crime severity directly as a predictor.
First off, we can see that the model agrees that severity has a huge effect on punishment ratings, but a small-if-any effect on case strength ratings:
The x-axis in the above plot shows the expected change in points from the least to most severe crime.
So according to the fit model, what is the source of the correlation between punishment and case strength ratings? The answer is, pretty much everything else. Most strikingly, the effects of evidence are very similar across the two scales, albeit much larger for case strength than punishment:
In the above plot, each point is an evidence category (that is, a combination of evidence type and strength/direction). Additionally, users’ baselines for case strength and punishment are correlated at 0.45 (CI = [0.24, 0.63]) (that is, people who think cases are generally stronger also think punishments should be harsher), residual response-level variability between the two scales is correlated at 0.33 (CI = [0.30, 0.37]), and, most strikingly, a crime’s leftover punishablility after crime severity is regressed out is correlated with its case strength (albeit with credible intervals crossing zero) at 0.32 (CI = [-0.05, 0.63]). That is, crimes that ellicit higher punishment ratings also ellicit stronger case strength ratings (probably), but in a way unrelated to the official classification of crime severity!
What does it all mean? I have no idea. I’ve included the full model output below in case anyone feels inclined to pore over the numbers trying to make sense of it all.
summary(mvfit_severe)
## Family: MV(gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: rating ~ 1 + severity + physical + document + witness + character + (1 + severity | p | uid) + (1 | q | scenario)
## rate_punishment ~ 1 + severity + physical + document + witness + character + (1 + severity | p | uid) + (1 | q | scenario)
## Data: cbind(dat, severity = dat[, (scenario - 1)/max(sce (Number of observations: 2512)
## Samples: 4 chains, each with iter = 1500; warmup = 500; thin = 3;
## total post-warmup samples = 1334
##
## Group-Level Effects:
## ~scenario (Number of levels: 31)
## Estimate Est.Error l-95% CI
## sd(rating_Intercept) 6.52 1.04 4.80
## sd(ratepunishment_Intercept) 10.49 1.50 7.97
## cor(rating_Intercept,ratepunishment_Intercept) 0.32 0.18 -0.05
## u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(rating_Intercept) 8.79 1.01 914 1190
## sd(ratepunishment_Intercept) 13.87 1.00 928 1192
## cor(rating_Intercept,ratepunishment_Intercept) 0.63 1.00 728 975
##
## ~uid (Number of levels: 81)
## Estimate Est.Error
## sd(rating_Intercept) 8.79 0.86
## sd(rating_severity) 4.94 2.13
## sd(ratepunishment_Intercept) 14.83 1.28
## sd(ratepunishment_severity) 25.02 2.51
## cor(rating_Intercept,rating_severity) 0.64 0.25
## cor(rating_Intercept,ratepunishment_Intercept) 0.45 0.10
## cor(rating_severity,ratepunishment_Intercept) 0.19 0.29
## cor(rating_Intercept,ratepunishment_severity) -0.13 0.13
## cor(rating_severity,ratepunishment_severity) -0.18 0.29
## cor(ratepunishment_Intercept,ratepunishment_severity) 0.33 0.11
## l-95% CI u-95% CI Rhat
## sd(rating_Intercept) 7.18 10.56 1.00
## sd(rating_severity) 0.67 9.04 1.01
## sd(ratepunishment_Intercept) 12.63 17.64 1.00
## sd(ratepunishment_severity) 20.39 30.40 1.00
## cor(rating_Intercept,rating_severity) -0.03 0.95 1.00
## cor(rating_Intercept,ratepunishment_Intercept) 0.24 0.63 1.01
## cor(rating_severity,ratepunishment_Intercept) -0.47 0.67 1.04
## cor(rating_Intercept,ratepunishment_severity) -0.37 0.14 1.00
## cor(rating_severity,ratepunishment_severity) -0.72 0.40 1.06
## cor(ratepunishment_Intercept,ratepunishment_severity) 0.11 0.54 1.00
## Bulk_ESS Tail_ESS
## sd(rating_Intercept) 932 1246
## sd(rating_severity) 1095 851
## sd(ratepunishment_Intercept) 805 1025
## sd(ratepunishment_severity) 1053 1211
## cor(rating_Intercept,rating_severity) 1253 1279
## cor(rating_Intercept,ratepunishment_Intercept) 658 1178
## cor(rating_severity,ratepunishment_Intercept) 116 332
## cor(rating_Intercept,ratepunishment_severity) 1081 1357
## cor(rating_severity,ratepunishment_severity) 74 277
## cor(ratepunishment_Intercept,ratepunishment_severity) 1197 1223
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat
## rating_Intercept 32.10 2.21 27.66 36.43 1.00
## ratepunishment_Intercept 52.97 2.90 47.18 58.68 1.00
## rating_severity 3.06 4.33 -5.55 11.09 1.01
## rating_physicalclear_ex -10.48 1.29 -12.99 -7.91 1.00
## rating_physicalambiguous 4.07 1.29 1.66 6.64 1.00
## rating_physicalclear_in 27.86 1.27 25.44 30.36 1.00
## rating_documentclear_ex -10.49 1.31 -13.05 -7.89 1.00
## rating_documentambiguous 4.27 1.33 1.68 6.76 1.00
## rating_documentclear_in 19.63 1.31 17.17 22.23 1.00
## rating_witnessclear_ex -7.47 1.32 -10.06 -4.93 1.00
## rating_witnessambiguous -1.69 1.25 -4.11 0.89 1.00
## rating_witnessclear_in 13.76 1.28 11.31 16.35 1.00
## rating_characterclear_ex -4.77 1.10 -7.01 -2.81 1.00
## rating_characterclear_in 8.27 1.14 5.97 10.42 1.00
## ratepunishment_severity 48.13 7.03 35.44 63.10 1.01
## ratepunishment_physicalclear_ex -1.00 1.11 -3.15 1.04 1.00
## ratepunishment_physicalambiguous 2.39 1.12 0.25 4.49 1.00
## ratepunishment_physicalclear_in 6.74 1.17 4.45 9.06 1.00
## ratepunishment_documentclear_ex -3.63 1.12 -5.73 -1.49 1.00
## ratepunishment_documentambiguous 0.55 1.13 -1.61 2.76 1.00
## ratepunishment_documentclear_in 4.83 1.14 2.66 7.16 1.00
## ratepunishment_witnessclear_ex -1.89 1.14 -4.18 0.30 1.00
## ratepunishment_witnessambiguous 0.43 1.15 -1.73 2.62 1.00
## ratepunishment_witnessclear_in 5.54 1.13 3.35 7.72 1.00
## ratepunishment_characterclear_ex -1.59 0.96 -3.44 0.28 1.00
## ratepunishment_characterclear_in 3.16 0.97 1.32 4.99 1.00
## Bulk_ESS Tail_ESS
## rating_Intercept 1051 1116
## ratepunishment_Intercept 938 1160
## rating_severity 868 1110
## rating_physicalclear_ex 1293 1181
## rating_physicalambiguous 1326 1320
## rating_physicalclear_in 1361 1265
## rating_documentclear_ex 1315 1212
## rating_documentambiguous 1401 1283
## rating_documentclear_in 1312 1240
## rating_witnessclear_ex 1415 1246
## rating_witnessambiguous 1304 1064
## rating_witnessclear_in 1293 1248
## rating_characterclear_ex 1421 1177
## rating_characterclear_in 1216 1148
## ratepunishment_severity 948 1194
## ratepunishment_physicalclear_ex 1181 1342
## ratepunishment_physicalambiguous 1353 1394
## ratepunishment_physicalclear_in 1406 1247
## ratepunishment_documentclear_ex 1179 1282
## ratepunishment_documentambiguous 1410 1192
## ratepunishment_documentclear_in 1381 1366
## ratepunishment_witnessclear_ex 1320 1135
## ratepunishment_witnessambiguous 1389 1247
## ratepunishment_witnessclear_in 1243 1322
## ratepunishment_characterclear_ex 1299 1284
## ratepunishment_characterclear_in 1284 1324
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sigma_rating 22.56 0.33 21.91 23.20 1.00 1485
## sigma_ratepunishment 19.56 0.28 19.04 20.12 1.00 1418
## Tail_ESS
## sigma_rating 1276
## sigma_ratepunishment 1341
##
## Residual Correlations:
## Estimate Est.Error l-95% CI u-95% CI Rhat
## rescor(rating,ratepunishment) 0.33 0.02 0.30 0.37 1.00
## Bulk_ESS Tail_ESS
## rescor(rating,ratepunishment) 1682 1368
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).