A distance function or metric a on the space \(\mathbb{R}^n,\:n\geq 1\), is a function \(d:\mathbb{R}^n\times\mathbb{R}^n\rightarrow \mathbb{R}\). It must satisfy some required axioms.
P1. \(d(\mathbf{x},\mathbf{y})= 0\iff \mathbf{x}=\mathbf{y}\) (identity of indiscernibles);
P2. \(d(\mathbf{x},\mathbf{y})= d(\mathbf{y},\mathbf{x})\) (symmetry);
P3. \(d(\mathbf{x},\mathbf{y})+d(\mathbf{y},\mathbf{z})\geq d(\mathbf{x},\mathbf{z})\) (triangle inequality),
where \(\mathbf{x}=(x_1,\cdots,x_n)\), \(\mathbf{y}=(y_1,\cdots,y_n)\) and \(\mathbf{z}=(z_1,\cdots,z_n)\) are any three vectors of \(\mathbb{R}^n\).
Exercice 2: Pove that these three axioms imply the non-negativity condition: \(d(\mathbf{x},\mathbf{y})\geq 0\).
We should use the term dissimilarity instead of distance when not all mathematical axioms for distances, that is P1-P3, are valid.
Let us recall and introduce some useful metrics:
\[d(\mathbf{x},\mathbf{y})=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}.\]
\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n |x_i-y_i|.\]
There exists also a weighted version of the above distance given by:
\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n \frac{|x_i-y_i|}{|x_i|+|y_i|}.\]
Note that the term \(|x_i−y_i|/(|x_i|+|y_i|)\) needs to be replaced by zero if both \(x_i\) and \(y_i\) are zero and that the Canberra distance is specially sensitive to small changes near zero.
Exercice 2: Prove that the Canberra distance is a true distance.
Both the Euclidian and Manattan distances are special cases of the Minkowski distance which is now defined:
\[d(\mathbf{x},\mathbf{y}) \left[\sum_{i=1} |x_i-y_i|{p}\right]^{1/p}, \]
Let also us define: \[\|\mathbf{x}\|_p\equiv\left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p},\: p\geq 1,\] where \(\|\mathbf{\cdot}\|_p\) is known as the p-norm or Minkowski norm. Note that the Minkowski distance can be defined as follows: \[ d(\mathbf{x},\mathbf{y})=\|\mathbf{x}-\mathbf{y}\|_p,\:p\geq 1. \] The proof of the triangular inequality P3 is based on the Minkowski inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), we have:
\[ \left[\sum_{i=1}^n (a_i+b_i)^{p}\right]^{1/p}\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} + \left[\sum_{i=1}^n b_i^{p}\right]^{1/p},\:p\geq 1. \] To prove that the Minkowski distance satisfies P3, notice that
\[ \sum_{i=1}^n|x_i-z_i|^{p}= \sum_{i=1}^n|(x_i-y_i)+(y_i-z_i)|^{p}. \] Noticing then that for any reals \(x,y\), we have \(|x+y|\leq |x|+|y|\), and using the fact that \(a^p\) is increasing in \(a>0\), we obtain
\[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \sum_{i=1}^n(|x_i-y_i|+|y_i-z_i|)^{p}. \] Applying then the Minkowski inequality to the RHS of the above inequality by posing \(a_i=|x_i-y_i|\) and \(b_i=|y_i-z_i|\), \(i=1,\cdots,n\), we get \[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \left(\sum_{i=1}^n |x_i-y_i|^{p}\right)^{1/p}+\left(\sum_{i=1}^n |y_i-z_i|^{p}\right)^{1/p}. \]
The proof of the Minkowski inequality itself requires the Hölder inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), and any \(p,q>1\) with \(1/p+1/q=1\), we have
\[ \sum_{i=1}^n a_ib_i\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} \left[\sum_{i=1}^n b_i^{q}\right]^{1/q} \] The proof of the above displayed Hölder inequality relies on the Young’s inequality: For any \(a,b>0\) we have \[ ab\leq \frac{a^p}{p}+\frac{b^q}{q}, \] with equality occuring iff: \(a^p=b^q\). To prove the Young’s inequality, one can use the (strict) convexity of the exponential function which tels us that for any reals \(x,y\), then
\[ e^{\frac{x}{p}+\frac{y}{q} }\leq \frac{e^{x}}{p}+\frac{e^{y}}{q}. \] We then set \(x=p\ln a\) and \(y=q\ln b\) to get the Young’s inequality. A good reference on the inequalities topic is: Z. Cvetkovski, Inequalities: theorems, techniques and selected problems, 2012, Springer Science & Business Media.
Note that the triangular inequality for the Minkowski distance implies
\[ \sum_{i=1}^n |x_i|\leq \left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p} ,\:p\geq 1. \]
Note that for \(p=2\), we have \(q=2\). The Hölder inequality implies for that special case \[ \sum_{i=1}^n|x_iy_i|\leq\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}. \]
Since the LHS od thes above inequality is greater then \(|\sum_{i=1}^nx_iy_i|\), we get the Cauchy-Schwartz inequality \[ |\sum_{i=1}^nx_iy_i|\leq\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}. \] Using the dot product noation called also scalar product noation \(\mathbf{x\cdot y}=\sum_{i=1}^nx_iy_i\), and the norm notation \(\|\mathbf{\cdot}\|_2 \|\), the Cauchy-Schwart inequality is:
\[|\mathbf{x\cdot y} | \leq \|\mathbf{x}\|_2 \| \mathbf{y}\|_2 \]
The cosine of the angle \(\theta\) between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is a measure of similarity given by: \[ \cos(\theta)=\frac{\mathbf{x}\cdot \mathbf{y}}{\|\mathbf{x}\|_2\|\mathbf{y}\|_2}=\frac{\sum_{i=1}^n x_i y_i}{{\sqrt{\sum_{i=1}^n x_i^2\sum_{i=1}^n y_i^2}}}. \] Note that the cosine of the angle between the two centred vectors \((x_1-\bar{\mathbf{x}},\cdots,x_n-\bar{\mathbf{x}})\) and \((y_1-\bar{\mathbf{y}},\cdots,y_n-\bar{\mathbf{y}})\) coincides with the Pearson correlation coefficient of \(\mathbf{x}\) and \(\mathbf{y}\).
The cosine correlation distance is defined by: \[ d(\mathbf{x},\mathbf{y})=1-\cos(\theta). \] It shares similar properties than the Pearson correlation distance. Likewise, Axioms P1 and P3 are not satisfied.
To calculate the Spearman rank-order correlation or Spearman correlation coefficient for short, we need to map seperately each of the vectors to ranked data values: \(\mathbf{x}\rightarrow \mathbf{x}^r=(x_1^r,\cdots,x_n^r)\). Here, \(x_i^r\) is the rank of \(x_i\) among the set of values of \(\mathbf{x}\). We illustrate this transformation with a simple example. If \(\mathbf{x}=(3, 1, 4, 15, 92)\), then the rank-order vector is \(\mathbf{x}^r=(2,1,3,4,5)\).
x=c(3, 1, 4, 15, 92)
rank(x)
The Spearman correlation of two numerical variables \(\mathbf{x}\) and \(\mathbf{y}\) is simply the Pearson correlation of the two correspnding rank-oerder variables \(\mathbf{x}^r\) and \(\mathbf{y}^r\), i.e. \(\rho(\mathbf{x}^r,\mathbf{y}^r)\). This measure is is useful because it is more robust against outliers than the Pearson correlation. The spearman distance is then defined by: \[ d(\mathbf{x},\mathbf{y})=1-\rho(\mathbf{x^r},\mathbf{y^r}).\]