1 Measures of dissimilarities

1.1 Classical distances

A distance function or metric on the space \(\mathbb{R}^n,\:n\geq 1\), is a function \(d:\mathbb{R}^n\times\mathbb{R}^n\rightarrow \mathbb{R}\).

It must satisfy some required axioms.

P1. \(d(\mathbf{x},\mathbf{y})= 0\iff \mathbf{x}=\mathbf{y}\) (identity of indiscernibles);

P2. \(d(\mathbf{x},\mathbf{y})= d(\mathbf{y},\mathbf{x})\) (symmetry);

P3. \(d(\mathbf{x},\mathbf{y})+d(\mathbf{y},\mathbf{z})\geq d(\mathbf{x},\mathbf{z})\) (triangle inequality),

where \(\mathbf{x}=(x_1,\cdots,x_n)\), \(\mathbf{y}=(y_1,\cdots,y_n)\) and \(\mathbf{z}=(z_1,\cdots,z_n)\) are any three vectors of \(\mathbb{R}^n\).

These three axioms imply the non-negativity condition: \(d(\mathbf{x},\mathbf{y})\geq 0\).

We shoul use the term dissimilarity instead of distance when not all mathematical axioms for distances are valid.

Let us introduce some popular metrics:

  1. Euclidean distance:

\[d(\mathbf{x},\mathbf{y})=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}.\]

  1. Manhattan distance:

\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n |x_i-y_i|.\]

There exists also a weighted version of the above distance given by:

  1. Canberra distance:

\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n \frac{|x_i-y_i|}{|x_i|+|y_i|}.\]

Note that the term \(|x_i−y_i|/(|x_i|+|y_i|)\) needs to be replaced by zero if both \(x_i\) and \(y_i\) are zero and that the Canberra distance is specially sensitive to small changes near zero.

Exercice: Prove that the Canberra distance is a true distance.

Both the Euclidian and Manattan distances are special cases of a more general Minkowski distance (resp. p = 2 and p=1).

  1. Minkowski distance:

\[ d(\mathbf{x},\mathbf{y}) = \left[\sum_{i=1}^n |x_i-y_i|^{p}\right]^{1/p},\: p\geq 1. \]

Let us define \[\|\mathbf{x}\|_p\equiv\left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p},\: p\geq 1,\] where \(\|\mathbf{\cdot}\|_p\) is known as the p-norm or Minkowski norm.

Note that the Minkowski distance can be defined as follows \[ d(\mathbf{x},\mathbf{y})=\|\mathbf{x}-\mathbf{y}\|_p,\:p\geq 1. \] The proof of the triangular inequality is based on the Minkowski inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), we have

\[ \left[\sum_{i=1}^n (a_i+b_i)^{p}\right]^{1/p}\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} + \left[\sum_{i=1}^n b_i^{p}\right]^{1/p},\:p\geq 1. \] To prove that the Minkowski distance satisfies P.3, notice that

\[ \sum_{i=1}^n|x_i-z_i|^{p}= \sum_{i=1}^n|(x_i-y_i)+(y_i-z_i)|^{p}. \] Noticing then that for any reals \(x,y\), we have \(|x+y|\leq |x|+|y|\), and using the fact that \(a^p\) is increasing in \(a>0\), we obtain

\[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \sum_{i=1}^n(|x_i-y_i|+|y_i-z_i|)^{p}. \] Applying then the Minkowski inequality to the RHS of the above inequality by posing \(a_i=|x_i-y_i|\) and \(b_i=|y_i-z_i|\), \(i=1,\cdots,n\), we get \[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \left(\sum_{i=1}^n |x_i-y_i|^{p}\right)^{1/p}+\left(\sum_{i=1}^n |y_i-z_i|^{p}\right)^{1/p}. \]

The proof of the Minkowski inequality requires the Hölder inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), and any \(p,q>1\) with \(1/p+1/q=1\), we have

\[ \sum_{i=1}^n a_ib_i\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} \left[\sum_{i=1}^n b_i^{q}\right]^{1/q} \] The proof of the Hölder inequality relies on the Young’s inequality: For any \(a,b>0\) we have \[ ab\leq \frac{a^p}{p}+\frac{b^q}{q} \] Equality occurs iff \(a^p=b^q\). To prove Young’s inequality, one can use the (strict) convexity of the exponential function wich tels us tht for any reals \(x,y\), then

\[ e^{\frac{x}{p}+\frac{y}{q} }\leq \frac{e^{x}}{p}+\frac{e^{y}}{q}. \] We then set \(x=p\ln a\) and \(y=q\ln b\) to get the Young’s inequality. A good reference on inequalities is:

Z. Cvetkovski, Inequalities: theorems, techniques and selected problems, 2012, Springer Science & Business Media.

Note that the above inequality implies

\[ \sum_{i=1}^n |x_i|\leq \left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p} ,\:p\geq 1. \]

Note that for \(p=2\), we have \(q=2\). The Hölder inequality implies for that special case \[ \sum_{i=1}^n|x_iy_i|\leq\sqrt{\sum_{i=1}^n|x_i|^2}\sqrt{\sum_{i=1}^n|y_i|^2}. \]

Since the LHS od thes above inequality is greated then \(|\sum_{i=1}^nx_iy_i|\), we get the the Cauchy-Schwartz inequality \[ \sum_{i=1}^n|x_iy_i|\leq\sqrt{\sum_{i=1}^n|x_i|^2}\sqrt{\sum_{i=1}^n|y_i|^2}. \] ## Correlation-based distances

  1. Pearson correlation distance \[ {\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2\sum_{i=1}^n (y_i-\bar{y})^2}}\]

  2. Eisen cosine correlation distance

  3. Spearman correlation distance

  4. Kendall correlation distance

LS0tCnRpdGxlOiAiQ2x1c3RlciBBbmFseXNpcyIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgbnVtYmVyX3NlY3Rpb25zOiB5ZXMKICBodG1sX2RvY3VtZW50OgogICAgdG9jOiB5ZXMKICAgIGRmX3ByaW50OiBwYWdlZAogIHdvcmRfZG9jdW1lbnQ6CiAgICB0b2M6IHllcwogIHBkZl9kb2N1bWVudDoKICAgIHRvYzogeWVzCi0tLQoKCiMgTWVhc3VyZXMgb2YgZGlzc2ltaWxhcml0aWVzCgoKIyMgQ2xhc3NpY2FsIGRpc3RhbmNlcwoKQSBkaXN0YW5jZSBmdW5jdGlvbiBvciBtZXRyaWMgb24gdGhlIHNwYWNlICRcbWF0aGJie1J9Xm4sXDpuXGdlcSAxJCwgaXMgYSBmdW5jdGlvbiAkZDpcbWF0aGJie1J9Xm5cdGltZXNcbWF0aGJie1J9Xm5ccmlnaHRhcnJvdyBcbWF0aGJie1J9JC4KCkl0IG11c3Qgc2F0aXNmeSBzb21lIHJlcXVpcmVkIGF4aW9tcy4gCgpQMS4gJGQoXG1hdGhiZnt4fSxcbWF0aGJme3l9KT0gMFxpZmYgXG1hdGhiZnt4fT1cbWF0aGJme3l9JCAoaWRlbnRpdHkgb2YgaW5kaXNjZXJuaWJsZXMpOwoKUDIuICRkKFxtYXRoYmZ7eH0sXG1hdGhiZnt5fSk9IGQoXG1hdGhiZnt5fSxcbWF0aGJme3h9KSQgKHN5bW1ldHJ5KTsKClAzLiAkZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pK2QoXG1hdGhiZnt5fSxcbWF0aGJme3p9KVxnZXEgZChcbWF0aGJme3h9LFxtYXRoYmZ7en0pJCAodHJpYW5nbGUgaW5lcXVhbGl0eSksCgp3aGVyZSAkXG1hdGhiZnt4fT0oeF8xLFxjZG90cyx4X24pJCwgJFxtYXRoYmZ7eX09KHlfMSxcY2RvdHMseV9uKSQgYW5kICRcbWF0aGJme3p9PSh6XzEsXGNkb3RzLHpfbikkIGFyZSBhbnkgdGhyZWUgdmVjdG9ycyBvZiAkXG1hdGhiYntSfV5uJC4KClRoZXNlIHRocmVlIGF4aW9tcyBpbXBseSB0aGUgbm9uLW5lZ2F0aXZpdHkgY29uZGl0aW9uOiAkZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pXGdlcSAwJC4KCldlIHNob3VsIHVzZSB0aGUgdGVybSBkaXNzaW1pbGFyaXR5IGluc3RlYWQgb2YgZGlzdGFuY2Ugd2hlbiBub3QgYWxsIG1hdGhlbWF0aWNhbCBheGlvbXMgZm9yIGRpc3RhbmNlcyBhcmUgdmFsaWQuCgoKCkxldCB1cyBpbnRyb2R1Y2Ugc29tZSBwb3B1bGFyIG1ldHJpY3M6CiAKMS4gRXVjbGlkZWFuIGRpc3RhbmNlOgoKJCRkKFxtYXRoYmZ7eH0sXG1hdGhiZnt5fSk9XHNxcnR7XHN1bV97aT0xfV5uICh4X2kteV9pKV4yfS4kJAoKMi4gTWFuaGF0dGFuIGRpc3RhbmNlOgoKJCRkKFxtYXRoYmZ7eH0sXG1hdGhiZnt5fSkKPVxzdW1fe2k9MX1ebiB8eF9pLXlfaXwuJCQKClRoZXJlIGV4aXN0cyBhbHNvIGEgIHdlaWdodGVkIHZlcnNpb24gIG9mIHRoZSBhYm92ZSBkaXN0YW5jZSBnaXZlbiBieToKCgozLiBDYW5iZXJyYSBkaXN0YW5jZToKCiQkZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pCj1cc3VtX3tpPTF9Xm4gXGZyYWN7fHhfaS15X2l8fXt8eF9pfCt8eV9pfH0uJCQKCk5vdGUgdGhhdCB0aGUgdGVybSAkfHhfaeKIknlfaXwvKHx4X2l8K3x5X2l8KSQgbmVlZHMKdG8gYmUgcmVwbGFjZWQgYnkgemVybyBpZiBib3RoICR4X2kkIGFuZCAkeV9pJCBhcmUgemVybyBhbmQgdGhhdCB0aGUgQ2FuYmVycmEgZGlzdGFuY2UgaXMgc3BlY2lhbGx5IHNlbnNpdGl2ZSB0byBzbWFsbCBjaGFuZ2VzIG5lYXIgemVyby4KCioqRXhlcmNpY2U6KiogUHJvdmUgdGhhdCB0aGUgQ2FuYmVycmEgZGlzdGFuY2UgaXMgYSB0cnVlIGRpc3RhbmNlLgoKCkJvdGggdGhlIEV1Y2xpZGlhbiBhbmQgTWFuYXR0YW4gZGlzdGFuY2VzIGFyZSBzcGVjaWFsIGNhc2VzIG9mIGEgbW9yZSBnZW5lcmFsIE1pbmtvd3NraSBkaXN0YW5jZSAocmVzcC4gcCA9IDIgYW5kIHA9MSkuCgoKCjQuIE1pbmtvd3NraSBkaXN0YW5jZTogCgokJApkKFxtYXRoYmZ7eH0sXG1hdGhiZnt5fSkKPQpcbGVmdFtcc3VtX3tpPTF9Xm4gfHhfaS15X2l8XntwfVxyaWdodF1eezEvcH0sXDogcFxnZXEgMS4KJCQKCgoKTGV0IHVzIGRlZmluZSAKJCRcfFxtYXRoYmZ7eH1cfF9wXGVxdWl2XGxlZnRbXHN1bV97aT0xfV5uIHx4X2l8XntwfVxyaWdodF1eezEvcH0sXDogcFxnZXEgMSwkJAp3aGVyZSAkXHxcbWF0aGJme1xjZG90fVx8X3AkIGlzIGtub3duIGFzIHRoZSBwLW5vcm0gb3IgTWlua293c2tpIG5vcm0uCgpOb3RlIHRoYXQgdGhlIE1pbmtvd3NraSBkaXN0YW5jZSBjYW4gYmUgZGVmaW5lZCBhcyBmb2xsb3dzCiQkCmQoXG1hdGhiZnt4fSxcbWF0aGJme3l9KT1cfFxtYXRoYmZ7eH0tXG1hdGhiZnt5fVx8X3AsXDpwXGdlcSAxLgokJApUaGUgcHJvb2Ygb2YgdGhlIHRyaWFuZ3VsYXIgaW5lcXVhbGl0eSBpcyBiYXNlZCBvbiB0aGUgTWlua293c2tpIGluZXF1YWxpdHkgd2hpY2ggc3RhdGVzIHRoYXQgZm9yIGFueSBub25uZWdhdGl2ZSByZWFsIG51bWJlcnMgJGFfMSxcY2RvdHMsYV9uJDsgJGJfMSxcY2RvdHMsYl9uJCwgd2UgaGF2ZQoKJCQKXGxlZnRbXHN1bV97aT0xfV5uIChhX2krYl9pKV57cH1ccmlnaHRdXnsxL3B9XGxlcQpcbGVmdFtcc3VtX3tpPTF9Xm4gYV9pXntwfVxyaWdodF1eezEvcH0KKwpcbGVmdFtcc3VtX3tpPTF9Xm4gYl9pXntwfVxyaWdodF1eezEvcH0sXDpwXGdlcSAxLgokJApUbyBwcm92ZSB0aGF0IHRoZSBNaW5rb3dza2kgZGlzdGFuY2Ugc2F0aXNmaWVzIFAuMywgbm90aWNlIHRoYXQgCgokJAogXHN1bV97aT0xfV5ufHhfaS16X2l8XntwfT0gXHN1bV97aT0xfV5ufCh4X2kteV9pKSsoeV9pLXpfaSl8XntwfS4KJCQKTm90aWNpbmcgdGhlbiB0aGF0IGZvciBhbnkgcmVhbHMgJHgseSQsIHdlIGhhdmUgJHx4K3l8XGxlcSB8eHwrfHl8JCwgYW5kIHVzaW5nIHRoZSBmYWN0IHRoYXQgJGFecCQgaXMgaW5jcmVhc2luZyBpbiAkYT4wJCwgd2Ugb2J0YWluCgokJAogXHN1bV97aT0xfV5ufHhfaS16X2l8XntwfVxsZXEgXHN1bV97aT0xfV5uKHx4X2kteV9pfCt8eV9pLXpfaXwpXntwfS4KJCQKQXBwbHlpbmcgdGhlbiB0aGUgTWlua293c2tpIGluZXF1YWxpdHkgdG8gdGhlIFJIUyBvZiB0aGUgYWJvdmUgaW5lcXVhbGl0eSBieSBwb3NpbmcgJGFfaT18eF9pLXlfaXwkIGFuZCAkYl9pPXx5X2ktel9pfCQsICRpPTEsXGNkb3RzLG4kLCB3ZSBnZXQKJCQKIFxzdW1fe2k9MX1ebnx4X2ktel9pfF57cH1cbGVxIFxsZWZ0KFxzdW1fe2k9MX1ebiB8eF9pLXlfaXxee3B9XHJpZ2h0KV57MS9wfStcbGVmdChcc3VtX3tpPTF9Xm4gfHlfaS16X2l8XntwfVxyaWdodCleezEvcH0uCiQkCgpUaGUgcHJvb2Ygb2YgdGhlIE1pbmtvd3NraSBpbmVxdWFsaXR5IHJlcXVpcmVzIHRoZSBIw7ZsZGVyIGluZXF1YWxpdHkgd2hpY2ggc3RhdGVzIHRoYXQgZm9yIGFueSBub25uZWdhdGl2ZSByZWFsIG51bWJlcnMgJGFfMSxcY2RvdHMsYV9uJDsgJGJfMSxcY2RvdHMsYl9uJCwgYW5kIGFueSAkcCxxPjEkIHdpdGggJDEvcCsxL3E9MSQsIHdlIGhhdmUKCiQkClxzdW1fe2k9MX1ebiBhX2liX2lcbGVxClxsZWZ0W1xzdW1fe2k9MX1ebiBhX2lee3B9XHJpZ2h0XV57MS9wfQpcbGVmdFtcc3VtX3tpPTF9Xm4gYl9pXntxfVxyaWdodF1eezEvcX0KJCQKVGhlIHByb29mIG9mIHRoZSBIw7ZsZGVyIGluZXF1YWxpdHkgcmVsaWVzIG9uIHRoZSBZb3VuZydzICBpbmVxdWFsaXR5OiBGb3IgYW55ICRhLGI+MCQgd2UgaGF2ZQokJAphYlxsZXEgXGZyYWN7YV5wfXtwfStcZnJhY3tiXnF9e3F9CiQkCkVxdWFsaXR5IG9jY3VycyBpZmYgJGFecD1iXnEkLiAgVG8gcHJvdmUgWW91bmcncyBpbmVxdWFsaXR5LCBvbmUgY2FuIHVzZSB0aGUgKHN0cmljdCkgY29udmV4aXR5IG9mIHRoZSBleHBvbmVudGlhbCBmdW5jdGlvbiB3aWNoIHRlbHMgdXMgdGh0IGZvciBhbnkgcmVhbHMgJHgseSQsIHRoZW4gCgokJAplXntcZnJhY3t4fXtwfStcZnJhY3t5fXtxfSB9XGxlcSBcZnJhY3tlXnt4fX17cH0rXGZyYWN7ZV57eX19e3F9LiAKJCQKV2UgdGhlbiBzZXQgJHg9cFxsbiBhJCBhbmQgJHk9cVxsbiBiJCB0byBnZXQgdGhlIFlvdW5nJ3MgaW5lcXVhbGl0eS4KIEEgZ29vZCByZWZlcmVuY2Ugb24gaW5lcXVhbGl0aWVzIGlzOgogCiBaLiBDdmV0a292c2tpLCAgSW5lcXVhbGl0aWVzOiB0aGVvcmVtcywgdGVjaG5pcXVlcyBhbmQgc2VsZWN0ZWQgcHJvYmxlbXMsIDIwMTIsIFNwcmluZ2VyIFNjaWVuY2UgJiBCdXNpbmVzcyBNZWRpYS4KCk5vdGUgdGhhdCB0aGUgYWJvdmUgaW5lcXVhbGl0eSBpbXBsaWVzIAoKJCQKXHN1bV97aT0xfV5uIHx4X2l8XGxlcQpcbGVmdFtcc3VtX3tpPTF9Xm4gfHhfaXxee3B9XHJpZ2h0XV57MS9wfQosXDpwXGdlcSAxLgokJAoKTm90ZSB0aGF0IGZvciAkcD0yJCwgd2UgaGF2ZSAkcT0yJC4gVGhlIEjDtmxkZXIgaW5lcXVhbGl0eSBpbXBsaWVzIGZvciB0aGF0IHNwZWNpYWwgY2FzZQokJApcc3VtX3tpPTF9Xm58eF9peV9pfFxsZXFcc3FydHtcc3VtX3tpPTF9Xm58eF9pfF4yfVxzcXJ0e1xzdW1fe2k9MX1ebnx5X2l8XjJ9LiAKJCQKClNpbmNlIHRoZSBMSFMgb2QgdGhlcyBhYm92ZSBpbmVxdWFsaXR5IGlzIGdyZWF0ZWQgdGhlbiAkfFxzdW1fe2k9MX1ebnhfaXlfaXwkLCB3ZSBnZXQgdGhlIHRoZSBDYXVjaHktU2Nod2FydHogaW5lcXVhbGl0eQokJApcc3VtX3tpPTF9Xm58eF9peV9pfFxsZXFcc3FydHtcc3VtX3tpPTF9Xm58eF9pfF4yfVxzcXJ0e1xzdW1fe2k9MX1ebnx5X2l8XjJ9LiAKJCQKIyMgQ29ycmVsYXRpb24tYmFzZWQgZGlzdGFuY2VzCgoxLiBQZWFyc29uIGNvcnJlbGF0aW9uIGRpc3RhbmNlCiQkCntcc3FydHtcc3VtX3tpPTF9Xm4gKHhfaS1cYmFye3h9KV4yXHN1bV97aT0xfV5uICh5X2ktXGJhcnt5fSleMn19JCQKMi4gRWlzZW4gY29zaW5lIGNvcnJlbGF0aW9uIGRpc3RhbmNlCgoyLiBTcGVhcm1hbiBjb3JyZWxhdGlvbiBkaXN0YW5jZQoKMy4gS2VuZGFsbCBjb3JyZWxhdGlvbiBkaXN0YW5jZQoK