Plots

Frequency

Frequency of (non)-colexification against WordNet relationship types

In general, this coarse pattern is also as expected. That two pairs are no expressed by the same form in a language is much more frequent than them colexifying. I’m not sure if there’s much more to be said. The colexifying pairs are all very close to 0.

Proportions

Proportions against relationship type. This looks pretty much like what you hypothesized, right? Proportion of colexified antonyms < hypernyms ~ meronyms. And these proportions are all higher than those of pairs that stand in no relationship.

Proportions against cosine similarity, per relationship type. This one is, again, less informative to me:

Some descriptive statistics

Frequencies

For colexifying pairs

df_freq_pos <- df_freq %>% filter(type=='colex_pos')
tapply(df_freq_pos$freq, df_freq_pos$wn.relation, summary)
## $antonymy
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00    8.35    4.00  102.00 
## 
## $hypernymy
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   9.438   6.000 235.000 
## 
## $meronymy
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    2.00   23.45   19.50  300.00 
## 
## $no.rel
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.1412   0.0000 263.0000

For non-colexifying pairs

df_freq_neg <- df_freq %>% filter(type=='colex_neg')
tapply(df_freq_neg$freq, df_freq_neg$wn.relation, summary)
## $antonymy
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    36.0   284.8   452.0   580.8   700.2  1830.0 
## 
## $hypernymy
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.0   158.2   245.0   294.8   332.5  1924.0 
## 
## $meronymy
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    30.0   288.5   584.0   618.5   774.5  1971.0 
## 
## $no.rel
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0   129.0   223.0   245.2   303.0  2076.0

Proportions

tapply(df_prop$colex.prop, df_prop$wn.relation, summary)
## $antonymy
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.000000 0.017144 0.004884 0.227692 
## 
## $hypernymy
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.003098 0.040312 0.036972 0.545454 
## 
## $meronymy
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.003802 0.037331 0.032796 0.372294 
## 
## $no.rel
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000000 0.0000000 0.0000000 0.0004481 0.0000000 0.8694030

This pretty much mirrors the plot above

Another look at proportions, but filtering out pairs that appear less than 30 times

Proportions against relationship type. This looks pretty much like what you hypothesized, right? Proportion of colexified antonyms < hypernyms ~ meronyms. And these proportions are all higher than those of pairs that stand in no relationship.

Proportions against cosine similarity, per relationship type. This one is, again, less informative to me:

tapply(df_prop$colex.prop, df_prop$wn.relation, summary)
## $antonymy
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.000000 0.017144 0.004884 0.227692 
## 
## $hypernymy
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.002918 0.035569 0.027221 0.545454 
## 
## $meronymy
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.003802 0.037331 0.032796 0.372294 
## 
## $no.rel
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000000 0.0000000 0.0000000 0.0004708 0.0000000 0.8694030