Hi Sanmi! A quick question for you about Kendall’s Tau in R. Here I have two sets - both just monotonic rankings:

load("/home/vanessa/Desktop/similarSets.Rda")
s$set1
## 141 137 135 139 133 140 165 190 167 160 450 166 173 168 170 172 169 455 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18 
## 440 174 163 441 185 436 189 437 459 442 456 453 451 464 162 445 156 187 
##  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36 
## 300 118 474 181 447 120 470 164 177 161 175 129 457 122 461 179 188 306 
##  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54 
## 183 119 299 148 466 124 115 138 178 539 303 157 180 439 186 116 305 144 
##  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 
## 155 443 308 544 149 126 132 154 458 152 528 533 142 117 121 143 307 472 
##  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90 
## 538 444 147 526 153 460 158 302 146 537 136 309 310 468 540 529 151 542 
##  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 
## 530 532 159 543 541 304 184 546 531 545 127 131 449 125 130 123 535 469 
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 
## 527 454 465 448 301 534 446 536 463 438 452 311 171 471 150 462 145 
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
s$set2
## 141 137 135 139 133 140 165 190 167 160 450 166 173 168 170 172 169 455 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18 
## 440 174 163 441 185 436 189 437 459 442 456 453 451 464 162 445 156 187 
##  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36 
## 300 118 474 181 447 120 470 164 177 161 175 129 457 122 461 179 188 306 
##  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54 
## 183 119 299 148 466 115 124 138 178 539 303 157 180 439 186 116 305 144 
##  55  56  57  58  59  61  60  62  63  64  65  66  67  68  69  70  71  72 
## 155 443 308 544 149 126 132 154 458 152 528 533 142 117 121 143 307 472 
##  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90 
## 538 444 147 526 153 460 158 302 146 537 136 309 310 468 529 540 151 542 
##  91  92  93  94  95  96  97  98  99 100 101 102 103 104 106 105 107 108 
## 530 532 159 543 541 304 184 546 531 545 127 131 449 125 130 123 535 469 
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 
## 527 454 465 448 301 534 446 536 463 438 452 311 171 471 150 462 145 
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
# The names are ID labels (we can see the sets are almost identical)
names(s$set1) == names(s$set2)
##   [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [23]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [34]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [45]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [56]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [67]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [78]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [89]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [100]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
## [111]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [122]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [133]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# We can also look at the rankings themselves (that go into the calculation)
# Again we see sets are almost identical
s$set1 == s$set2
##   141   137   135   139   133   140   165   190   167   160   450   166 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   173   168   170   172   169   455   440   174   163   441   185   436 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   189   437   459   442   456   453   451   464   162   445   156   187 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   300   118   474   181   447   120   470   164   177   161   175   129 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   457   122   461   179   188   306   183   119   299   148   466   124 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE 
##   115   138   178   539   303   157   180   439   186   116   305   144 
## FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   155   443   308   544   149   126   132   154   458   152   528   533 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   142   117   121   143   307   472   538   444   147   526   153   460 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   158   302   146   537   136   309   310   468   540   529   151   542 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE 
##   530   532   159   543   541   304   184   546   531   545   127   131 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   449   125   130   123   535   469   527   454   465   448   301   534 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 
##   446   536   463   438   452   311   171   471   150   462   145 
##  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

So then I use Kendall’s Tau to calculate the tau statistic:

tau = cor.test(s$set1,s$set2, method = c("kendall"), conf.level = 0.95)
summary(tau)
##             Length Class  Mode     
## statistic   1      -none- numeric  
## parameter   0      -none- NULL     
## p.value     1      -none- numeric  
## estimate    1      -none- numeric  
## null.value  1      -none- numeric  
## alternative 1      -none- character
## method      1      -none- character
## data.name   1      -none- character

And here is my silly question. I was seeing strange behavior with the p-values, namely the p-value comes out as 0 when sets aren’t very different:

tau$p.value
## [1] 0

I also noticed that the “null” is specified as 0:

tau$null.value
## tau 
##   0

and there isn’t that much other stuff embedded in this tau object

str(tau)
## List of 8
##  $ statistic  : Named num 17.7
##   ..- attr(*, "names")= chr "z"
##  $ parameter  : NULL
##  $ p.value    : num 0
##  $ estimate   : Named num 1
##   ..- attr(*, "names")= chr "tau"
##  $ null.value : Named num 0
##   ..- attr(*, "names")= chr "tau"
##  $ alternative: chr "two.sided"
##  $ method     : chr "Kendall's rank correlation tau"
##  $ data.name  : chr "s$set1 and s$set2"
##  - attr(*, "class")= chr "htest"

So I wanted to get your feedback on this - is this the package / implementation that you use, and generally how do you synthesize the output / p values? My thinking is that if it is exactly zero (eg, “0”) this translates to the null value (no significant difference) but if it is something slightly off, that corresponds to an actual p-value? I am doing a bunch of these tests and then correcting for multiple comparisons, and as you can imagine, all the zeros would be interpreted as the opposite of what I think they should be. Thanks for your advice on this!