All data and source code can be found here: https://github.com/judyseinkim/Intuitive-Theories-of-Color
Across Experiments 1 and 3, participants named the colors of 54 objects. (Exp 1: 30 objects, "What is one common color of... ?" and Exp 3: 24 objects, "What is the most common color of... ?"). Objects were chosen from three larger types: natural kinds (NK) (e.g. lemons), artifacts with non-functional colors (A-NFC) (e.g. cars) and artifacts with functional colors (A-FC) (e.g. stop-signs).
In the table below you can find the colors provided by all individual blind and sighted participants (use the search tool to filter by specific objects or participant). Data for function and filler trials are not shown here, but can be found in 'real_objects_all_raw.csv' in the respository. A note about missing data: responses for 'dollar bill' were not collected for half of the participants (for both blind sighted groups) due to experimenter error.
For each object, we quantified naming agreement by using the Simpson's Diversity Index (Majid et al., 2018; Kim et al., 2019). For unique color words (1 to R) provided for each object across all participants within a group (blind or sighted), a naming agreement score was calculated according to the equation below. N is the total number of words used across participants for each object, and n is the number of times each unique word (1 to R) was provided. The index ranges for 0 to 1, where 0 indicates that the same color word was never used by two participants (i.e., low color naming agreement), and 1 suggests all participants provided the same color (i.e., high naming agreement).
Although participants were instructed to provide one color, a few provided multiple (at most 3, e.g., "red, white, and blue"). All of these colors were included in the analysis. Further, a small proportion of participants said “I don’t know” or provided words that were not typical color terms (dark, light, beige, neon). These responses were treated the same as color terms (“I don’t know” counted as one word, coded "IDK").
\[D=\frac{\sum_{i=1}^R n_i(n_i-1)}{N(N-1)}\]
This table is the same data as above, this time organized by frequency count of color words provided for each object (within group).
This time, showing SDIs for each object.
The bar graph below shows SDIs averaged across object types within group (error bars are across items 95% confidence intervals).
To examine differences across groups, we perform linear mixed effects regression on log-transformed SDIs (using lme4, objects as random effects). Results are summarized in the table below. As reported in our paper, there is a big difference across groups in color naming agreement, but no group by kind interaction.
## term partial.omegasq
## 1 Group 0.571
## 2 Kind 0.250
## 3 Group:Kind -0.010
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| Group | 71.109 | 1 | 0.000 |
| Kind | 20.000 | 2 | 0.000 |
| Group:Kind | 1.487 | 2 | 0.475 |
Participants were asked to judge the likelihood that two objects (e.g. two lemons), randomly chosen from the same object category, would have the same color for natural kinds, artifacts with non-functional colors, and artifacts with functional colors, for real objects (Experiment 1) and novel objects (Experiment 2). In a control condition, participants also judged the likelihood that two people chosen at random would do the same thing with an object (e.g. a leaf vs. a car). Participants rated consistency likelihood on a scale of 1 to 7 (1: not likely, 7: very likely).
Both sighted and blind participants showed a double-dissociation between object kind (natural vs. artifacts) and trial type (color vs. usage), as shown belown. For usage trials, participants rated the likelihood that the object would be used for the same purpose as low for natural kinds and high for artifacts. In contrast, for color, participants judged that natural kinds are more likely to have the same color. In addition, both blind and sighted participants knew that not all artifacts are the same--those with function-relevant colors (e.g., stop signs) were judged to have high color consistency.
Consistency likelihood judgments were analyzed using ordinal logistic regression (using clmm and ordinal packages, participants and objects and random effects). These data can be found in 'data/real_object_inf_allData_exp1.csv'
First, we compared group differences for natural kinds and artifacts with non-functional color only (since artifacts with functional color are a special category). This also allows us to look at a group (blind vs. sighted) x object kind (natural vs. artifact) x trial type (color vs. function) interaction. Baselines are coded as sighted group, usage trial, artifact.
##
## Asymptotic Wilcoxon-Pratt Signed-Rank Test
##
## data: y by x (pos, neg)
## stratified by block
## Z = -3.8234, p-value = 0.0001316
## alternative hypothesis: true mu is not equal to 0
## Cumulative Link Mixed Model fitted with the Laplace approximation
##
## formula: Rating ~ Group * TrialType * Kind + (1 | Subject) + (1 | Object)
## data: d.realInf
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 1560 -2366.13 4762.26 1618(4873) 3.56e-03 6.7e+02
##
## Random effects:
## Groups Name Variance Std.Dev.
## Object (Intercept) 0.5593 0.7479
## Subject (Intercept) 0.6778 0.8233
## Number of groups: Object 40, Subject 39
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## GroupCB 0.8819 0.3331 2.648 0.008097 **
## TrialTypeColor -3.3801 0.3934 -8.593 < 2e-16 ***
## KindNatural -2.9265 0.3909 -7.487 7.02e-14 ***
## GroupCB:TrialTypeColor -0.8920 0.2695 -3.309 0.000935 ***
## GroupCB:KindNatural -0.7156 0.2698 -2.653 0.007983 **
## TrialTypeColor:KindNatural 7.1742 0.5729 12.523 < 2e-16 ***
## GroupCB:TrialTypeColor:KindNatural -0.2387 0.3779 -0.632 0.527605
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 1|2 -5.3561 0.3638 -14.721
## 2|3 -4.2509 0.3540 -12.009
## 3|4 -3.0407 0.3466 -8.773
## 4|5 -2.1722 0.3419 -6.352
## 5|6 -0.7994 0.3360 -2.379
## 6|7 0.7705 0.3358 2.294
We also looked at a group comparison for color trials only, this time including all three kinds of objects (natural, artifact with functional color, artifact with non-functional color).
## Cumulative Link Mixed Model fitted with the Laplace approximation
##
## formula: Rating ~ Group * Kind + (1 | Subject) + (1 | Object)
## data: d.realColor
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 1149 -1679.70 3385.39 1231(7794) 4.77e-04 9.5e+02
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 0.8452 0.9193
## Object (Intercept) 2.0077 1.4169
## Number of groups: Subject 39, Object 30
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## GroupCB -0.04353 0.34470 -0.126 0.899502
## KindNatural 4.30389 0.67430 6.383 1.74e-10 ***
## KindArtifactFC 2.84198 0.66869 4.250 2.14e-05 ***
## GroupCB:KindNatural -1.01518 0.26866 -3.779 0.000158 ***
## GroupCB:KindArtifactFC 0.44737 0.27101 1.651 0.098795 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 1|2 -2.1573 0.5225 -4.129
## 2|3 -0.9547 0.5155 -1.852
## 3|4 0.4463 0.5151 0.866
## 4|5 1.2028 0.5168 2.328
## 5|6 2.5422 0.5221 4.870
## 6|7 4.1583 0.5300 7.845
## (21 observations deleted due to missingness)
Note that the same color consistency questions for the additional 24 objects used in Experiment 3. The figure below shows this data combined with the data above. These data were not combined in the main paper because the number of trials becomes unbalanced for color vs. usage trials (plus, the results are the same). These data can be found in 'data/real_object_inf_allData_exp1_and_3.csv'.
Initially, the functionally-relevant vs. non-functionally relevant distinction was decided by the experimenters. In reality, this is likely not an either-or distinction: the colors of an artifact may have varying levels of relevance for its function. Therefore, after all the main had been collected, we additionally obtained MTurk ratings (n=20) for the functional relevance of color to artifacts. Participants were asked "How important is the color of a XXX to its function?" and had to rate on a scale of 1 to 7 (not at all to very relevant). Below are the average ratings, by object.
We split the artifacts into artifacts with functional vs. non-functional colors according to ratings from MTurk participants. Artifacts with functional colors have a rating above 4. Both blind and sighted participants' consistency ratings were correlated with the functional relevance judgments.
##
## Spearman's rank correlation rho
##
## data: s_artifact_cons$Rating_M.y and s_artifact_cons$Rating_M.x
## S = 601.73, p-value = 0.01245
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5475743
##
## Spearman's rank correlation rho
##
## data: cb_artifact_cons$Rating_M.y and cb_artifact_cons$Rating_M.x
## S = 532.1, p-value = 0.00517
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5999248
## Cumulative Link Mixed Model fitted with the Laplace approximation
##
## formula: Rating ~ Group * Rating_M.y + (1 | Subject) + (1 | Object)
## data: d.real_artifact_ind
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 759 -1156.86 2335.72 946(6083) 1.16e-04 2.9e+03
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1.243 1.115
## Object (Intercept) 3.101 1.761
## Number of groups: Subject 39, Object 20
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## GroupS 0.12087 0.53420 0.226 0.820998
## Rating_M.y 1.16396 0.30382 3.831 0.000128 ***
## GroupS:Rating_M.y -0.06707 0.10663 -0.629 0.529358
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 1|2 0.281 1.213 0.232
## 2|3 1.618 1.210 1.338
## 3|4 3.220 1.212 2.658
## 4|5 4.028 1.215 3.317
## 5|6 5.487 1.224 4.483
## 6|7 6.909 1.236 5.590
## (21 observations deleted due to missingness)
Inferences for novel objects were nearly identical to inferences for real objects. Again, both groups show a double-dissociation between object kind (natural vs. artifacts) and trial type (color vs. usage).
As with real objects, we look at consistency likelihood judgments were analyzed using ordinal logistic regression.
Starting with a group comparison for natural kinds and artifacts with non-functional color only (looking for a group (blind vs. sighted) x object kind (natural vs. artifact) x trial type (color vs. function) interaction, baselines sighted group, usage trial, artifact).
## Cumulative Link Mixed Model fitted with the Laplace approximation
##
## formula: Rating ~ Group * TrialType * Kind + (1 | Subject) + (1 | Object)
## data: d.novelInf
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 780 -1255.67 2541.34 1475(4428) 1.35e-03 4.1e+02
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 0.7090 0.8420
## Object (Intercept) 0.1659 0.4073
## Number of groups: Subject 39, Object 20
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## GroupCB 0.4123 0.3864 1.067 0.286
## TrialTypeColor -2.4870 0.3750 -6.632 3.32e-11 ***
## KindNatural -3.6693 0.3895 -9.420 < 2e-16 ***
## GroupCB:TrialTypeColor -0.1329 0.3716 -0.358 0.721
## GroupCB:KindNatural 0.0934 0.3769 0.248 0.804
## TrialTypeColor:KindNatural 6.0872 0.5579 10.911 < 2e-16 ***
## GroupCB:TrialTypeColor:KindNatural -0.1062 0.5235 -0.203 0.839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 1|2 -5.2395 0.3858 -13.580
## 2|3 -3.7590 0.3602 -10.436
## 3|4 -2.5654 0.3461 -7.412
## 4|5 -1.6954 0.3374 -5.025
## 5|6 -0.4750 0.3300 -1.439
## 6|7 0.7278 0.3312 2.197
Now for group comparison for color trials only, this time including all three kinds of objects (natural, artifact with functional color, artifact with non-functional color).
## Cumulative Link Mixed Model fitted with the Laplace approximation
##
## formula: Rating ~ Group * Kind + (1 | Subject) + (1 | Object)
## data: d.novelColor
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 585 -922.80 1871.60 1045(3138) 1.26e-03 2.5e+02
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 0.6847 0.8275
## Object (Intercept) 0.1973 0.4442
## Number of groups: Subject 39, Object 15
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## GroupCB 0.30553 0.36516 0.837 0.403
## KindNatural 2.44672 0.39180 6.245 4.24e-10 ***
## KindArtifactFC 2.41734 0.39409 6.134 8.57e-10 ***
## GroupCB:KindNatural -0.06636 0.36379 -0.182 0.855
## GroupCB:KindArtifactFC 0.52348 0.37738 1.387 0.165
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 1|2 -2.71047 0.39483 -6.865
## 2|3 -1.29903 0.34211 -3.797
## 3|4 -0.04924 0.33240 -0.148
## 4|5 0.80087 0.33520 2.389
## 5|6 1.98669 0.34566 5.748
## 6|7 3.29100 0.36004 9.141
We collected color-function relevance data separately for novel objects (n=25 MTurk participants). Below are the average ratings, by object.
As with real artifacts, novel artifacts were split into artifacts with functional vs. non-functional colors according to ratings from MTurk participants. Both blind and sighted participants' consistency ratings were correlated with the functional relevance judgments.
##
## Spearman's rank correlation rho
##
## data: s_novel_artifact_cons$Rating_M.y and s_novel_artifact_cons$Rating_M.x
## S = 32, p-value = 0.03112
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.7333333
##
## Spearman's rank correlation rho
##
## data: cb_novel_artifact_cons$Rating_M.y and cb_novel_artifact_cons$Rating_M.x
## S = 52, p-value = 0.1206
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5666667
## Cumulative Link Mixed Model fitted with the Laplace approximation
##
## formula: Rating ~ Group * Rating_M.y + (1 | Subject) + (1 | Object)
## data: d.novel_artifact_ind
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 351 -579.35 1180.69 898(3598) 4.37e-05 4.8e+03
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1.334 1.155
## Object (Intercept) 1.061 1.030
## Number of groups: Subject 39, Object 9
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## GroupS 1.2228 1.1434 1.069 0.28488
## Rating_M.y 1.4078 0.4910 2.867 0.00414 **
## GroupS:Rating_M.y -0.4323 0.2626 -1.646 0.09976 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 1|2 1.176 2.068 0.569
## 2|3 2.781 2.058 1.351
## 3|4 4.203 2.061 2.040
## 4|5 5.113 2.066 2.475
## 5|6 6.342 2.077 3.054
## 6|7 7.636 2.091 3.651
## (39 observations deleted due to missingness)
In Experiment 3, blind and sighted participants were asked a series of questions about the colors of objects (1. What is the most common color of apples? 2. Are all apples ____? If no, please list the other colors of apples. 3. If you picked two apples at random... 4. Are all parts of an apple a single color, or does the color vary across the apple? If it varies, how does the color vary over the apple?). Finally, participants were asked, "5. Why are apples that (those) color (colors)?" The answers for Question 5 were analyzed according to the procedure below.
Note that at the start of the experiment, participants were instructed: "This question is meant to be very open-ended, so you should provide whatever explanation seems right to you." In addition, since the preceeding questions asked about the most common color as well as how color might vary, participants were additionally instructed: "If you had answered that “All pies are crumbly,” then you should provide an explanation for why all pies are crumbly. If you had answered that “No, not all pies are crumbly, some are smooth, flaky, and so on”, then you should provide an answer for why pies are all of those different textures." Therefore, most of the explanations provided are about the common color of objects, but some are also for why colors vary across instances of the same object. We did not separate these answers, but instead created a "it varies" category when coding explanations by type.
All raw explanations can be found in 'data/explanations.csv'.
Explanation types were decided by the experimenters based on examining all the explanations (blind to group and object). There were 9 types: ‘process’, ‘depends on’, ‘just is’, ‘material’, ‘social/aesthetic’, ‘maker’, ‘visibility’, ‘convention’, and ‘I don’t know’. Below is a key that the coders used to tag explanations by type.
Explanations were coded by four coders who did not know which object or group each explanation came from. Note, however, that in a small number of instances participants said the object's name in their explanations, and at other times, it was fairly easy to discern the object from the explanation.
There was large variability in how many words participants used in their explanations (range=1 to 165 words, M=13 words). This meant that each explanation (i.e., what one participant said for one object) could contain multiple explanation types. For example, a participants’ answer that the color of a wedding dress is due to “symbolism, or personal style”, was coded as containing ‘convention’ (for symbolism) and ‘social/aesthetic’ (for personal style) explanations. However, the same word or phrase (e.g., personal style) was never coded for more than one explanation type.
Some participants gave lengthier explanations than others, without necessarily providing additional information (e.g., often telling anecdotal stories to make a point). For wedding dress, for instance, another participant explained: “well, there's something about tradition, and white being associated with purity and virginity and all that, but beyond that it's just a matter of demand, if you want a baby barf green wedding dress that's your problem”. This explanation was also coded with ‘convention’ and social/aesthetic’.
Coding was then filtered according to the criteria that at least three out of four coders have to agree. The first author (5th coder) made some additional changes, again keeping group and objects blind. After this process, the number of explanation types per explanation (again, a single explanation from one participant for one object) only ranged from 1-3 (mean=1.26).
We examined how similar explantions were across groups by computing Spearmans' correlation for across groups within object kind and across kinds within groups.