Hi Sanmi! A quick question for you about Kendall’s Tau in R. Here I have two sets - both just monotonic rankings:
load("/home/vanessa/Desktop/similarSets.Rda")
s$set1
## 141 137 135 139 133 140 165 190 167 160 450 166 173 168 170 172 169 455
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 440 174 163 441 185 436 189 437 459 442 456 453 451 464 162 445 156 187
## 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## 300 118 474 181 447 120 470 164 177 161 175 129 457 122 461 179 188 306
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## 183 119 299 148 466 124 115 138 178 539 303 157 180 439 186 116 305 144
## 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## 155 443 308 544 149 126 132 154 458 152 528 533 142 117 121 143 307 472
## 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## 538 444 147 526 153 460 158 302 146 537 136 309 310 468 540 529 151 542
## 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## 530 532 159 543 541 304 184 546 531 545 127 131 449 125 130 123 535 469
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## 527 454 465 448 301 534 446 536 463 438 452 311 171 471 150 462 145
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
s$set2
## 141 137 135 139 133 140 165 190 167 160 450 166 173 168 170 172 169 455
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 440 174 163 441 185 436 189 437 459 442 456 453 451 464 162 445 156 187
## 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## 300 118 474 181 447 120 470 164 177 161 175 129 457 122 461 179 188 306
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## 183 119 299 148 466 115 124 138 178 539 303 157 180 439 186 116 305 144
## 55 56 57 58 59 61 60 62 63 64 65 66 67 68 69 70 71 72
## 155 443 308 544 149 126 132 154 458 152 528 533 142 117 121 143 307 472
## 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## 538 444 147 526 153 460 158 302 146 537 136 309 310 468 529 540 151 542
## 91 92 93 94 95 96 97 98 99 100 101 102 103 104 106 105 107 108
## 530 532 159 543 541 304 184 546 531 545 127 131 449 125 130 123 535 469
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## 527 454 465 448 301 534 446 536 463 438 452 311 171 471 150 462 145
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
# The names are ID labels (we can see the sets are almost identical)
names(s$set1) == names(s$set2)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [12] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [23] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [34] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [45] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [56] TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [67] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [78] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [89] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [100] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
## [111] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [122] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [133] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# We can also look at the rankings themselves (that go into the calculation)
# Again we see sets are almost identical
s$set1 == s$set2
## 141 137 135 139 133 140 165 190 167 160 450 166
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 173 168 170 172 169 455 440 174 163 441 185 436
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 189 437 459 442 456 453 451 464 162 445 156 187
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 300 118 474 181 447 120 470 164 177 161 175 129
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 457 122 461 179 188 306 183 119 299 148 466 124
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## 115 138 178 539 303 157 180 439 186 116 305 144
## FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 155 443 308 544 149 126 132 154 458 152 528 533
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 142 117 121 143 307 472 538 444 147 526 153 460
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 158 302 146 537 136 309 310 468 540 529 151 542
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
## 530 532 159 543 541 304 184 546 531 545 127 131
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 449 125 130 123 535 469 527 454 465 448 301 534
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 446 536 463 438 452 311 171 471 150 462 145
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
So then I use Kendall’s Tau to calculate the tau statistic:
tau = cor.test(s$set1,s$set2, method = c("kendall"), conf.level = 0.95)
summary(tau)
## Length Class Mode
## statistic 1 -none- numeric
## parameter 0 -none- NULL
## p.value 1 -none- numeric
## estimate 1 -none- numeric
## null.value 1 -none- numeric
## alternative 1 -none- character
## method 1 -none- character
## data.name 1 -none- character
And here is my silly question. I was seeing strange behavior with the p-values, namely the p-value comes out as 0 when sets aren’t very different:
tau$p.value
## [1] 0
I also noticed that the “null” is specified as 0:
tau$null.value
## tau
## 0
and there isn’t that much other stuff embedded in this tau object
str(tau)
## List of 8
## $ statistic : Named num 17.7
## ..- attr(*, "names")= chr "z"
## $ parameter : NULL
## $ p.value : num 0
## $ estimate : Named num 1
## ..- attr(*, "names")= chr "tau"
## $ null.value : Named num 0
## ..- attr(*, "names")= chr "tau"
## $ alternative: chr "two.sided"
## $ method : chr "Kendall's rank correlation tau"
## $ data.name : chr "s$set1 and s$set2"
## - attr(*, "class")= chr "htest"
So I wanted to get your feedback on this - is this the package / implementation that you use, and generally how do you synthesize the output / p values? My thinking is that if it is exactly zero (eg, “0”) this translates to the null value (no significant difference) but if it is something slightly off, that corresponds to an actual p-value? I am doing a bunch of these tests and then correcting for multiple comparisons, and as you can imagine, all the zeros would be interpreted as the opposite of what I think they should be. Thanks for your advice on this!