Measures of dissimilarities
- Classical distances
- Correlation-based distances;

Measures of dissimilarities

Classical distances

A distance function or metric a on the space \(\mathbb{R}^n,\:n\geq 1\), is a function \(d:\mathbb{R}^n\times\mathbb{R}^n\rightarrow \mathbb{R}\). It must satisfy some required axioms.

P1. \(d(\mathbf{x},\mathbf{y})= 0\iff \mathbf{x}=\mathbf{y}\) (identity of indiscernibles);

P2. \(d(\mathbf{x},\mathbf{y})= d(\mathbf{y},\mathbf{x})\) (symmetry);

P3. \(d(\mathbf{x},\mathbf{y})+d(\mathbf{y},\mathbf{z})\geq d(\mathbf{x},\mathbf{z})\) (triangle inequality),

where \(\mathbf{x}=(x_1,\cdots,x_n)\), \(\mathbf{y}=(y_1,\cdots,y_n)\) and \(\mathbf{z}=(z_1,\cdots,z_n)\) are any three vectors of \(\mathbb{R}^n\).

Exercice 2: Pove that these three axioms imply the non-negativity condition: \(d(\mathbf{x},\mathbf{y})\geq 0\).

We should use the term dissimilarity instead of distance when not all mathematical axioms for distances, that is P1-P3, are valid.

Let us recall and introduce some useful metrics:

Euclidean distance:

\[d(\mathbf{x},\mathbf{y})=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}.\]

Manhattan distance:

\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n |x_i-y_i|.\]

There exists also a weighted version of the above distance given by:

Canberra distance:

\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n \frac{|x_i-y_i|}{|x_i|+|y_i|}.\]

Note that the term \(|x_i−y_i|/(|x_i|+|y_i|)\) needs to be replaced by zero if both \(x_i\) and \(y_i\) are zero and that the Canberra distance is specially sensitive to small changes near zero.

Exercice 2: Prove that the Canberra distance is a true distance.

Both the Euclidian and Manattan distances are special cases of the Minkowski distance which is now defined:

Minkowski distance:

\[d(\mathbf{x},\mathbf{y}) \left[\sum_{i=1} |x_i-y_i|{p}\right]^{1/p}, \]

Let also us define: \[\|\mathbf{x}\|_p\equiv\left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p},\: p\geq 1,\] where \(\|\mathbf{\cdot}\|_p\) is known as the p-norm or Minkowski norm. Note that the Minkowski distance can be defined as follows: \[ d(\mathbf{x},\mathbf{y})=\|\mathbf{x}-\mathbf{y}\|_p,\:p\geq 1. \] The proof of the triangular inequality P3 is based on the Minkowski inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), we have:

\[ \left[\sum_{i=1}^n (a_i+b_i)^{p}\right]^{1/p}\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} + \left[\sum_{i=1}^n b_i^{p}\right]^{1/p},\:p\geq 1. \] To prove that the Minkowski distance satisfies P3, notice that

\[ \sum_{i=1}^n|x_i-z_i|^{p}= \sum_{i=1}^n|(x_i-y_i)+(y_i-z_i)|^{p}. \] Noticing then that for any reals \(x,y\), we have \(|x+y|\leq |x|+|y|\), and using the fact that \(a^p\) is increasing in \(a>0\), we obtain

\[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \sum_{i=1}^n(|x_i-y_i|+|y_i-z_i|)^{p}. \] Applying then the Minkowski inequality to the RHS of the above inequality by posing \(a_i=|x_i-y_i|\) and \(b_i=|y_i-z_i|\), \(i=1,\cdots,n\), we get \[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \left(\sum_{i=1}^n |x_i-y_i|^{p}\right)^{1/p}+\left(\sum_{i=1}^n |y_i-z_i|^{p}\right)^{1/p}. \]

The proof of the Minkowski inequality itself requires the Hölder inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), and any \(p,q>1\) with \(1/p+1/q=1\), we have

\[ \sum_{i=1}^n a_ib_i\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} \left[\sum_{i=1}^n b_i^{q}\right]^{1/q} \] The proof of the above displayed Hölder inequality relies on the Young’s inequality: For any \(a,b>0\) we have \[ ab\leq \frac{a^p}{p}+\frac{b^q}{q}, \] with equality occuring iff: \(a^p=b^q\). To prove the Young’s inequality, one can use the (strict) convexity of the exponential function which tels us that for any reals \(x,y\), then

\[ e^{\frac{x}{p}+\frac{y}{q} }\leq \frac{e^{x}}{p}+\frac{e^{y}}{q}. \] We then set \(x=p\ln a\) and \(y=q\ln b\) to get the Young’s inequality. A good reference on the inequalities topic is: Z. Cvetkovski, Inequalities: theorems, techniques and selected problems, 2012, Springer Science & Business Media.

Note that the triangular inequality for the Minkowski distance implies

\[ \sum_{i=1}^n |x_i|\leq \left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p} ,\:p\geq 1. \]

Note that for \(p=2\), we have \(q=2\). The Hölder inequality implies for that special case \[ \sum_{i=1}^n|x_iy_i|\leq\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}. \]

Since the LHS od thes above inequality is greater then \(|\sum_{i=1}^nx_iy_i|\), we get the Cauchy-Schwartz inequality \[ |\sum_{i=1}^nx_iy_i|\leq\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}. \] Using the dot product noation called also scalar product noation \(\mathbf{x\cdot y}=\sum_{i=1}^nx_iy_i\), and the norm notation \(\|\mathbf{\cdot}\|_2 \|\), the Cauchy-Schwart inequality is:

\[|\mathbf{x\cdot y} | \leq \|\mathbf{x}\|_2 \| \mathbf{y}\|_2 \]

Correlation-based distances;

Cosine correlation distance.

The cosine of the angle \(\theta\) between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is a measure of similarity given by: \[ \cos(\theta)=\frac{\mathbf{x}\cdot \mathbf{y}}{\|\mathbf{x}\|_2\|\mathbf{y}\|_2}=\frac{\sum_{i=1}^n x_i y_i}{{\sqrt{\sum_{i=1}^n x_i^2\sum_{i=1}^n y_i^2}}}. \] Note that the cosine of the angle between the two centred vectors \((x_1-\bar{\mathbf{x}},\cdots,x_n-\bar{\mathbf{x}})\) and \((y_1-\bar{\mathbf{y}},\cdots,y_n-\bar{\mathbf{y}})\) coincides with the Pearson correlation coefficient of \(\mathbf{x}\) and \(\mathbf{y}\).

The cosine correlation distance is defined by: \[ d(\mathbf{x},\mathbf{y})=1-\cos(\theta). \] It shares similar properties than the Pearson correlation distance. Likewise, Axioms P1 and P3 are not satisfied.

Spearman correlation distance

To calculate the Spearman rank-order correlation or Spearman correlation coefficient for short, we need to map seperately each of the vectors to ranked data values: \(\mathbf{x}\rightarrow \mathbf{x}^r=(x_1^r,\cdots,x_n^r)\). Here, \(x_i^r\) is the rank of \(x_i\) among the set of values of \(\mathbf{x}\). We illustrate this transformation with a simple example. If \(\mathbf{x}=(3, 1, 4, 15, 92)\), then the rank-order vector is \(\mathbf{x}^r=(2,1,3,4,5)\).

x=c(3, 1, 4, 15, 92)
rank(x)

The Spearman correlation of two numerical variables \(\mathbf{x}\) and \(\mathbf{y}\) is simply the Pearson correlation of the two correspnding rank-oerder variables \(\mathbf{x}^r\) and \(\mathbf{y}^r\), i.e. \(\rho(\mathbf{x}^r,\mathbf{y}^r)\). This measure is is useful because it is more robust against outliers than the Pearson correlation. The spearman distance is then defined by: \[ d(\mathbf{x},\mathbf{y})=1-\rho(\mathbf{x^r},\mathbf{y^r}).\]

Kendall correlation distance

Ci0tLQp0aXRsZTogIkNsdXN0ZXIgQW5hbHlzaXMiCm91dHB1dDogCiAgaHRtbF9ub3RlYm9vazogCiAgICB0b2M6IHllcwotLS0KCgoKIyBNZWFzdXJlcyBvZiBkaXNzaW1pbGFyaXRpZXMKCgojIyBDbGFzc2ljYWwgZGlzdGFuY2VzCgpBIGRpc3RhbmNlIGZ1bmN0aW9uIG9yIG1ldHJpYyBhIG9uIHRoZSBzcGFjZSAkXG1hdGhiYntSfV5uLFw6blxnZXEgMSQsIGlzIGEgZnVuY3Rpb24gJGQ6XG1hdGhiYntSfV5uXHRpbWVzXG1hdGhiYntSfV5uXHJpZ2h0YXJyb3cgXG1hdGhiYntSfSQuIEl0IG11c3Qgc2F0aXNmeSBzb21lIHJlcXVpcmVkIGF4aW9tcy4gCgpQMS4gJGQoXG1hdGhiZnt4fSxcbWF0aGJme3l9KT0gMFxpZmYgXG1hdGhiZnt4fT1cbWF0aGJme3l9JCAoaWRlbnRpdHkgb2YgaW5kaXNjZXJuaWJsZXMpOwoKUDIuICRkKFxtYXRoYmZ7eH0sXG1hdGhiZnt5fSk9IGQoXG1hdGhiZnt5fSxcbWF0aGJme3h9KSQgKHN5bW1ldHJ5KTsKClAzLiAkZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pK2QoXG1hdGhiZnt5fSxcbWF0aGJme3p9KVxnZXEgZChcbWF0aGJme3h9LFxtYXRoYmZ7en0pJCAodHJpYW5nbGUgaW5lcXVhbGl0eSksCgp3aGVyZSAkXG1hdGhiZnt4fT0oeF8xLFxjZG90cyx4X24pJCwgJFxtYXRoYmZ7eX09KHlfMSxcY2RvdHMseV9uKSQgYW5kICRcbWF0aGJme3p9PSh6XzEsXGNkb3RzLHpfbikkIGFyZSBhbnkgdGhyZWUgdmVjdG9ycyBvZiAkXG1hdGhiYntSfV5uJC4KCioqRXhlcmNpY2UgMjoqKiBQb3ZlIHRoYXQgdGhlc2UgdGhyZWUgYXhpb21zIGltcGx5IHRoZSBub24tbmVnYXRpdml0eSBjb25kaXRpb246ICRkKFxtYXRoYmZ7eH0sXG1hdGhiZnt5fSlcZ2VxIDAkLgoKV2Ugc2hvdWxkIHVzZSB0aGUgdGVybSBkaXNzaW1pbGFyaXR5IGluc3RlYWQgb2YgZGlzdGFuY2Ugd2hlbiBub3QgYWxsIG1hdGhlbWF0aWNhbCBheGlvbXMgZm9yIGRpc3RhbmNlcywgdGhhdCBpcyBQMS1QMywgYXJlIHZhbGlkLgoKTGV0IHVzIHJlY2FsbCBhbmQgaW50cm9kdWNlIHNvbWUgdXNlZnVsIG1ldHJpY3M6CiAKIyMjIEV1Y2xpZGVhbiBkaXN0YW5jZToKCiQkZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pPVxzcXJ0e1xzdW1fe2k9MX1ebiAoeF9pLXlfaSleMn0uJCQKCiMjIyBNYW5oYXR0YW4gZGlzdGFuY2U6CgokJGQoXG1hdGhiZnt4fSxcbWF0aGJme3l9KQo9XHN1bV97aT0xfV5uIHx4X2kteV9pfC4kJAoKVGhlcmUgZXhpc3RzIGFsc28gYSAgd2VpZ2h0ZWQgdmVyc2lvbiAgb2YgdGhlIGFib3ZlIGRpc3RhbmNlIGdpdmVuIGJ5OgoKCiMjIyBDYW5iZXJyYSBkaXN0YW5jZToKCiQkZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pCj1cc3VtX3tpPTF9Xm4gXGZyYWN7fHhfaS15X2l8fXt8eF9pfCt8eV9pfH0uJCQKCk5vdGUgdGhhdCB0aGUgdGVybSAkfHhfaeKIknlfaXwvKHx4X2l8K3x5X2l8KSQgbmVlZHMgdG8gYmUgcmVwbGFjZWQgYnkgemVybyBpZiBib3RoICR4X2kkIGFuZCAkeV9pJCBhcmUgemVybyBhbmQgdGhhdCB0aGUgQ2FuYmVycmEgZGlzdGFuY2UgaXMgc3BlY2lhbGx5IHNlbnNpdGl2ZSB0byBzbWFsbCBjaGFuZ2VzIG5lYXIgemVyby4KCioqRXhlcmNpY2UgMjoqKiBQcm92ZSB0aGF0IHRoZSBDYW5iZXJyYSBkaXN0YW5jZSBpcyBhIHRydWUgZGlzdGFuY2UuCgpCb3RoIHRoZSBFdWNsaWRpYW4gYW5kIE1hbmF0dGFuIGRpc3RhbmNlcyBhcmUgc3BlY2lhbCBjYXNlcyBvZiB0aGUgTWlua293c2tpIGRpc3RhbmNlIHdoaWNoIGlzIG5vdyBkZWZpbmVkOgoKIyMjIE1pbmtvd3NraSBkaXN0YW5jZTogCgokJGQoXG1hdGhiZnt4fSxcbWF0aGJme3l9KQpcbGVmdFtcc3VtX3tpPTF9IHx4X2kteV9pfHtwfVxyaWdodF1eezEvcH0sCiQkCgoKCkxldCBhbHNvIHVzIGRlZmluZTogCiQkXHxcbWF0aGJme3h9XHxfcFxlcXVpdlxsZWZ0W1xzdW1fe2k9MX1ebiB8eF9pfF57cH1ccmlnaHRdXnsxL3B9LFw6IHBcZ2VxIDEsJCQKd2hlcmUgJFx8XG1hdGhiZntcY2RvdH1cfF9wJCBpcyBrbm93biBhcyB0aGUgcC1ub3JtIG9yIE1pbmtvd3NraSBub3JtLiBOb3RlIHRoYXQgdGhlIE1pbmtvd3NraSBkaXN0YW5jZSBjYW4gYmUgZGVmaW5lZCBhcyBmb2xsb3dzOgokJApkKFxtYXRoYmZ7eH0sXG1hdGhiZnt5fSk9XHxcbWF0aGJme3h9LVxtYXRoYmZ7eX1cfF9wLFw6cFxnZXEgMS4KJCQKVGhlIHByb29mIG9mIHRoZSB0cmlhbmd1bGFyIGluZXF1YWxpdHkgUDMgaXMgYmFzZWQgb24gdGhlIE1pbmtvd3NraSBpbmVxdWFsaXR5IHdoaWNoIHN0YXRlcyB0aGF0IGZvciBhbnkgbm9ubmVnYXRpdmUgcmVhbCBudW1iZXJzICRhXzEsXGNkb3RzLGFfbiQ7ICRiXzEsXGNkb3RzLGJfbiQsIHdlIGhhdmU6CgokJApcbGVmdFtcc3VtX3tpPTF9Xm4gKGFfaStiX2kpXntwfVxyaWdodF1eezEvcH1cbGVxClxsZWZ0W1xzdW1fe2k9MX1ebiBhX2lee3B9XHJpZ2h0XV57MS9wfQorClxsZWZ0W1xzdW1fe2k9MX1ebiBiX2lee3B9XHJpZ2h0XV57MS9wfSxcOnBcZ2VxIDEuCiQkClRvIHByb3ZlIHRoYXQgdGhlIE1pbmtvd3NraSBkaXN0YW5jZSBzYXRpc2ZpZXMgUDMsIG5vdGljZSB0aGF0IAoKJCQKIFxzdW1fe2k9MX1ebnx4X2ktel9pfF57cH09IFxzdW1fe2k9MX1ebnwoeF9pLXlfaSkrKHlfaS16X2kpfF57cH0uCiQkCk5vdGljaW5nIHRoZW4gdGhhdCBmb3IgYW55IHJlYWxzICR4LHkkLCB3ZSBoYXZlICR8eCt5fFxsZXEgfHh8K3x5fCQsIGFuZCB1c2luZyB0aGUgZmFjdCB0aGF0ICRhXnAkIGlzIGluY3JlYXNpbmcgaW4gJGE+MCQsIHdlIG9idGFpbgoKJCQKIFxzdW1fe2k9MX1ebnx4X2ktel9pfF57cH1cbGVxIFxzdW1fe2k9MX1ebih8eF9pLXlfaXwrfHlfaS16X2l8KV57cH0uCiQkCkFwcGx5aW5nIHRoZW4gdGhlIE1pbmtvd3NraSBpbmVxdWFsaXR5IHRvIHRoZSBSSFMgb2YgdGhlIGFib3ZlIGluZXF1YWxpdHkgYnkgcG9zaW5nICRhX2k9fHhfaS15X2l8JCBhbmQgJGJfaT18eV9pLXpfaXwkLCAkaT0xLFxjZG90cyxuJCwgd2UgZ2V0CiQkCiBcc3VtX3tpPTF9Xm58eF9pLXpfaXxee3B9XGxlcSBcbGVmdChcc3VtX3tpPTF9Xm4gfHhfaS15X2l8XntwfVxyaWdodCleezEvcH0rXGxlZnQoXHN1bV97aT0xfV5uIHx5X2ktel9pfF57cH1ccmlnaHQpXnsxL3B9LgokJAoKVGhlIHByb29mIG9mIHRoZSBNaW5rb3dza2kgaW5lcXVhbGl0eSBpdHNlbGYgcmVxdWlyZXMgdGhlIEjDtmxkZXIgaW5lcXVhbGl0eSB3aGljaCBzdGF0ZXMgdGhhdCBmb3IgYW55IG5vbm5lZ2F0aXZlIHJlYWwgbnVtYmVycyAkYV8xLFxjZG90cyxhX24kOyAkYl8xLFxjZG90cyxiX24kLCBhbmQgYW55ICRwLHE+MSQgd2l0aCAkMS9wKzEvcT0xJCwgd2UgaGF2ZQoKJCQKXHN1bV97aT0xfV5uIGFfaWJfaVxsZXEKXGxlZnRbXHN1bV97aT0xfV5uIGFfaV57cH1ccmlnaHRdXnsxL3B9ClxsZWZ0W1xzdW1fe2k9MX1ebiBiX2lee3F9XHJpZ2h0XV57MS9xfQokJApUaGUgcHJvb2Ygb2YgdGhlIGFib3ZlIGRpc3BsYXllZCBIw7ZsZGVyIGluZXF1YWxpdHkgcmVsaWVzIG9uIHRoZSBZb3VuZydzICBpbmVxdWFsaXR5OiBGb3IgYW55ICRhLGI+MCQgd2UgaGF2ZQokJAphYlxsZXEgXGZyYWN7YV5wfXtwfStcZnJhY3tiXnF9e3F9LAokJAp3aXRoIGVxdWFsaXR5IG9jY3VyaW5nIGlmZjogJGFecD1iXnEkLiAgVG8gcHJvdmUgdGhlIFlvdW5nJ3MgaW5lcXVhbGl0eSwgb25lIGNhbiB1c2UgdGhlIChzdHJpY3QpIGNvbnZleGl0eSBvZiB0aGUgZXhwb25lbnRpYWwgZnVuY3Rpb24gd2hpY2ggdGVscyB1cyB0aGF0IGZvciBhbnkgcmVhbHMgJHgseSQsIHRoZW4gCgokJAplXntcZnJhY3t4fXtwfStcZnJhY3t5fXtxfSB9XGxlcSBcZnJhY3tlXnt4fX17cH0rXGZyYWN7ZV57eX19e3F9LiAKJCQKV2UgdGhlbiBzZXQgJHg9cFxsbiBhJCBhbmQgJHk9cVxsbiBiJCB0byBnZXQgdGhlIFlvdW5nJ3MgaW5lcXVhbGl0eS4KIEEgZ29vZCByZWZlcmVuY2Ugb24gdGhlIGluZXF1YWxpdGllcyB0b3BpYyBpczogWi4gQ3ZldGtvdnNraSwgIEluZXF1YWxpdGllczogdGhlb3JlbXMsIHRlY2huaXF1ZXMgYW5kIHNlbGVjdGVkIHByb2JsZW1zLCAyMDEyLCBTcHJpbmdlciBTY2llbmNlICYgQnVzaW5lc3MgTWVkaWEuCgpOb3RlIHRoYXQgdGhlIHRyaWFuZ3VsYXIgaW5lcXVhbGl0eSBmb3IgdGhlIE1pbmtvd3NraSBkaXN0YW5jZSBpbXBsaWVzIAoKJCQKXHN1bV97aT0xfV5uIHx4X2l8XGxlcQpcbGVmdFtcc3VtX3tpPTF9Xm4gfHhfaXxee3B9XHJpZ2h0XV57MS9wfQosXDpwXGdlcSAxLgokJAoKTm90ZSB0aGF0IGZvciAkcD0yJCwgd2UgaGF2ZSAkcT0yJC4gVGhlIEjDtmxkZXIgaW5lcXVhbGl0eSBpbXBsaWVzIGZvciB0aGF0IHNwZWNpYWwgY2FzZQokJApcc3VtX3tpPTF9Xm58eF9peV9pfFxsZXFcc3FydHtcc3VtX3tpPTF9Xm4geF9pXjJ9XHNxcnR7XHN1bV97aT0xfV5uIHlfaV4yfS4gCiQkCgpTaW5jZSB0aGUgTEhTIG9kIHRoZXMgYWJvdmUgaW5lcXVhbGl0eSBpcyBncmVhdGVyIHRoZW4gJHxcc3VtX3tpPTF9Xm54X2l5X2l8JCwgd2UgZ2V0IHRoZSBDYXVjaHktU2Nod2FydHogaW5lcXVhbGl0eQokJAp8XHN1bV97aT0xfV5ueF9peV9pfFxsZXFcc3FydHtcc3VtX3tpPTF9Xm4geF9pXjJ9XHNxcnR7XHN1bV97aT0xfV5uIHlfaV4yfS4gCiQkClVzaW5nIHRoZSBkb3QgcHJvZHVjdCBub2F0aW9uIGNhbGxlZCBhbHNvIHNjYWxhciBwcm9kdWN0IG5vYXRpb24gJFxtYXRoYmZ7eFxjZG90IHl9PVxzdW1fe2k9MX1ebnhfaXlfaSQsIGFuZCB0aGUgbm9ybSBub3RhdGlvbiAkXHxcbWF0aGJme1xjZG90fVx8XzIgXHwkLCB0aGUgQ2F1Y2h5LVNjaHdhcnQgaW5lcXVhbGl0eSBpczoKCiQkfFxtYXRoYmZ7eFxjZG90IHl9IHwgXGxlcSBcfFxtYXRoYmZ7eH1cfF8yIFx8IFxtYXRoYmZ7eX1cfF8yICAkJAoKCiMjIENvcnJlbGF0aW9uLWJhc2VkIGRpc3RhbmNlczsKCiMjIyBQZWFyc29uIGNvcnJlbGF0aW9uIGFuZCByZWxhdGVkIGRpc3RhbmNlczoKClRoZSBQZWFyc29uIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50IGlzIGEgc2ltaWxhcml0eSBtZWFzdXJlIG9uICRcbWF0aGJie1J9Xm4kIGRlZmluZWQgYnk6CiQkClxyaG8oXG1hdGhiZnt4fSxcbWF0aGJme3l9KT0KXGZyYWN7XHN1bV97aT0xfV5uICh4X2ktXGJhcntcbWF0aGJme3h9fSkoeV9pLVxiYXJ7XG1hdGhiZnt5fX0pfXt7XHNxcnR7XHN1bV97aT0xfV5uICh4X2ktXGJhcntcbWF0aGJme3h9fSleMlxzdW1fe2k9MX1ebiAoeV9pLVxiYXJ7XG1hdGhiZnt5fX0pXjJ9fX0sCiQkCndoZXJlICRcYmFye1xtYXRoYmZ7eH19JCBpcyB0aGUgbWVhbiBvZiB0aGUgdmVjdG9yICRcbWF0aGJme3h9JCBkZWZpbmVkIGJ5OiAKJCRcYmFye1xtYXRoYmZ7eH19PVxmcmFjezF9e259XHN1bV97aT0xfV5uIHhfaSwkJAoKTm90ZSB0aGF0IHRoZSBQZWFyc29uIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50IHNhdGlzZmllcyBQMiBhbmQgIGlzIGludmFyaWFudCB0byBhbnkgcG9zaXRpdmUgbGluZWFyIHRyYW5zZm9ybWF0aW9uLCBpLmUuOiAkXHJobyhcYWxwaGFcbWF0aGJme3h9LFxtYXRoYmZ7eX0pPVxyaG8oXG1hdGhiZnt4fSxcbWF0aGJme3l9KSQsIGZvciBhbnkgJFxhbHBoYT4wJC4KClRoZSBQZWFyc29uIGRpc3RhbmNlIChvciBjb3JyZWxhdGlvbiBkaXN0YW5jZSkgaXMgZGVmaW5lZCBieToKJCQKZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pPTEtXHJobyhcbWF0aGJme3h9LFxtYXRoYmZ7eX0pLiQkCk5vdGUgdGhhdCB0aGUgUGVhcnNvbiBkaXN0YW5jZSBkb2VzIG5vdCBzYXRpc2Z5ICRQMSQgc2luY2UgJGQoXG1hdGhiZnt4fSxcbWF0aGJme3h9KT0wJCBmb3IgYW55IG5vbi16ZXJvIHZlY3RvciAkXG1hdGhiZnt4fSQuIEl0IG5laXRoZXIgc2F0aXNmaWVzIHRoZSB0cmlhbmdsZSBpbmVxdWFsaXR5LiBIb3dldmVyLCB0aGUgc3ltbWV0cnkgcHJvcGVydHkgaXMgZnVsbGZpbGxlZC4gCgojIyMgQ29zaW5lIGNvcnJlbGF0aW9uIGRpc3RhbmNlLgoKVGhlIGNvc2luZSBvZiB0aGUgYW5nbGUgJFx0aGV0YSQgYmV0d2VlbiB0d28gdmVjdG9ycyAkXG1hdGhiZnt4fSQgYW5kICRcbWF0aGJme3l9JCBpcyBhIG1lYXN1cmUgb2Ygc2ltaWxhcml0eSBnaXZlbiBieToKJCQKXGNvcyhcdGhldGEpPVxmcmFje1xtYXRoYmZ7eH1cY2RvdCBcbWF0aGJme3l9fXtcfFxtYXRoYmZ7eH1cfF8yXHxcbWF0aGJme3l9XHxfMn09XGZyYWN7XHN1bV97aT0xfV5uIHhfaSB5X2l9e3tcc3FydHtcc3VtX3tpPTF9Xm4geF9pXjJcc3VtX3tpPTF9Xm4geV9pXjJ9fX0uCiQkCk5vdGUgdGhhdCB0aGUgY29zaW5lIG9mIHRoZSBhbmdsZSBiZXR3ZWVuIHRoZSB0d28gY2VudHJlZCB2ZWN0b3JzICQoeF8xLVxiYXJ7XG1hdGhiZnt4fX0sXGNkb3RzLHhfbi1cYmFye1xtYXRoYmZ7eH19KSQgYW5kICQoeV8xLVxiYXJ7XG1hdGhiZnt5fX0sXGNkb3RzLHlfbi1cYmFye1xtYXRoYmZ7eX19KSQgY29pbmNpZGVzIHdpdGggdGhlIFBlYXJzb24gY29ycmVsYXRpb24gY29lZmZpY2llbnQgb2YgJFxtYXRoYmZ7eH0kIGFuZCAkXG1hdGhiZnt5fSQuICAKClRoZSBjb3NpbmUgY29ycmVsYXRpb24gZGlzdGFuY2UgaXMgZGVmaW5lZCBieToKJCQKZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pPTEtXGNvcyhcdGhldGEpLgokJApJdCBzaGFyZXMgc2ltaWxhciBwcm9wZXJ0aWVzIHRoYW4gdGhlIFBlYXJzb24gY29ycmVsYXRpb24gZGlzdGFuY2UuIExpa2V3aXNlLCBBeGlvbXMgUDEgYW5kIFAzIGFyZSBub3Qgc2F0aXNmaWVkLgoKCiMjIyBTcGVhcm1hbiBjb3JyZWxhdGlvbiBkaXN0YW5jZQoKVG8gY2FsY3VsYXRlIHRoZSBTcGVhcm1hbiByYW5rLW9yZGVyIGNvcnJlbGF0aW9uIG9yIFNwZWFybWFuIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50IGZvciBzaG9ydCwgd2UgbmVlZCB0byBtYXAgc2VwZXJhdGVseSBlYWNoIG9mIHRoZSB2ZWN0b3JzIHRvIHJhbmtlZCBkYXRhIHZhbHVlczogJFxtYXRoYmZ7eH1ccmlnaHRhcnJvdyBcbWF0aGJme3h9XnI9KHhfMV5yLFxjZG90cyx4X25ecikkLiBIZXJlLCAkeF9pXnIkIGlzIHRoZSByYW5rIG9mICR4X2kkIGFtb25nIHRoZSBzZXQgb2YgdmFsdWVzIG9mICRcbWF0aGJme3h9JC4gV2UgaWxsdXN0cmF0ZSB0aGlzIHRyYW5zZm9ybWF0aW9uIHdpdGggYSBzaW1wbGUgZXhhbXBsZS4gSWYgJFxtYXRoYmZ7eH09KDMsIDEsIDQsIDE1LCA5MikkLCB0aGVuIHRoZSByYW5rLW9yZGVyIHZlY3RvciBpcyAkXG1hdGhiZnt4fV5yPSgyLDEsMyw0LDUpJC4gIAoKCmBgYHtyfQp4PWMoMywgMSwgNCwgMTUsIDkyKQpyYW5rKHgpCgpgYGAKClRoZSBTcGVhcm1hbiBjb3JyZWxhdGlvbiBvZiB0d28gbnVtZXJpY2FsIHZhcmlhYmxlcyAkXG1hdGhiZnt4fSQgIGFuZCAkXG1hdGhiZnt5fSQgaXMgc2ltcGx5IHRoZSBQZWFyc29uIGNvcnJlbGF0aW9uIG9mIHRoZSB0d28gY29ycmVzcG5kaW5nIHJhbmstb2VyZGVyIHZhcmlhYmxlcyAkXG1hdGhiZnt4fV5yJCBhbmQgJFxtYXRoYmZ7eX1eciQsIGkuZS4gJFxyaG8oXG1hdGhiZnt4fV5yLFxtYXRoYmZ7eX1ecikkLiBUaGlzIG1lYXN1cmUgaXMgaXMgdXNlZnVsIGJlY2F1c2UgaXQgaXMgbW9yZSByb2J1c3QgYWdhaW5zdCBvdXRsaWVycyB0aGFuIHRoZSBQZWFyc29uIGNvcnJlbGF0aW9uLgogVGhlIHNwZWFybWFuIGRpc3RhbmNlIGlzIHRoZW4gZGVmaW5lZCBieToKJCQKZChcbWF0aGJme3h9LFxtYXRoYmZ7eX0pPTEtXHJobyhcbWF0aGJme3hecn0sXG1hdGhiZnt5XnJ9KS4kJAoKCgogCgoKIyMjIEtlbmRhbGwgY29ycmVsYXRpb24gZGlzdGFuY2UKCg==

Cluster Analysis