A distance function or metric a on the space \(\mathbb{R}^n,\:n\geq 1\), is a function \(d:\mathbb{R}^n\times\mathbb{R}^n\rightarrow \mathbb{R}\). It must satisfy some required axioms.
P1. \(d(\mathbf{x},\mathbf{y})= 0\iff \mathbf{x}=\mathbf{y}\) (identity of indiscernibles);
P2. \(d(\mathbf{x},\mathbf{y})= d(\mathbf{y},\mathbf{x})\) (symmetry);
P3. \(d(\mathbf{x},\mathbf{y})+d(\mathbf{y},\mathbf{z})\geq d(\mathbf{x},\mathbf{z})\) (triangle inequality),
where \(\mathbf{x}=(x_1,\cdots,x_n)\), \(\mathbf{y}=(y_1,\cdots,y_n)\) and \(\mathbf{z}=(z_1,\cdots,z_n)\) are any three vectors of \(\mathbb{R}^n\).
Prove that the three axioms imply the non-negativity condition: \(d(\mathbf{x},\mathbf{y})\geq 0\).
We should use the term dissimilarity instead of distance when not all mathematical axioms for distances, that is P1-P3, are valid.
\[d(\mathbf{x},\mathbf{y})=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}.\]
\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n |x_i-y_i|.\]
There exists also a weighted version of the above distance called Canberra distance.
\[d(\mathbf{x},\mathbf{y}) =\sum_{i=1}^n \frac{|x_i-y_i|}{|x_i|+|y_i|}.\]
Note that the term \(|x_i−y_i|/(|x_i|+|y_i|)\) needs to be replaced by zero if both \(x_i\) and \(y_i\) are zero and that the Canberra distance is specially sensitive to small changes near zero.
Prove that the Canberra distance is a true distance.
Both the Euclidian and Manattan distances are special cases of the Minkowski distance which is now defined.
\[d(\mathbf{x},\mathbf{y}) \left[\sum_{i=1} |x_i-y_i|^{p}\right]^{1/p}, \]
Let also us define: \[\|\mathbf{x}\|_p\equiv\left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p},\: p\geq 1,\]
where \(\|\mathbf{\cdot}\|_p\) is known as the p-norm or Minkowski norm.
Note that the Minkowski distance can be defined as follows: \[ d(\mathbf{x},\mathbf{y})=\|\mathbf{x}-\mathbf{y}\|_p,\:p\geq 1. \] # Minkowski inequality
The proof of the triangular inequality P3 is based on the Minkowski inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), we have:
\[ \left[\sum_{i=1}^n (a_i+b_i)^{p}\right]^{1/p}\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} + \left[\sum_{i=1}^n b_i^{p}\right]^{1/p},\:p\geq 1. \] # Minkowski inequality proof
To prove that the Minkowski distance satisfies P3, notice that
\[ \sum_{i=1}^n|x_i-z_i|^{p}= \sum_{i=1}^n|(x_i-y_i)+(y_i-z_i)|^{p}. \] Noticing then that for any reals \(x,y\), we have \(|x+y|\leq |x|+|y|\), and using the fact that \(a^p\) is increasing in \(a>0\), we obtain
\[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \sum_{i=1}^n(|x_i-y_i|+|y_i-z_i|)^{p}. \] Applying then the Minkowski inequality to the RHS of the above inequality by posing \(a_i=|x_i-y_i|\) and \(b_i=|y_i-z_i|\), \(i=1,\cdots,n\), we get \[ \sum_{i=1}^n|x_i-z_i|^{p}\leq \left(\sum_{i=1}^n |x_i-y_i|^{p}\right)^{1/p}+\left(\sum_{i=1}^n |y_i-z_i|^{p}\right)^{1/p}. \]
The proof of the Minkowski inequality itself requires the Hölder inequality which states that for any nonnegative real numbers \(a_1,\cdots,a_n\); \(b_1,\cdots,b_n\), and any \(p,q>1\) with \(1/p+1/q=1\), we have
\[ \sum_{i=1}^n a_ib_i\leq \left[\sum_{i=1}^n a_i^{p}\right]^{1/p} \left[\sum_{i=1}^n b_i^{q}\right]^{1/q} \] # Young’s inequality
The proof of the above displayed Hölder inequality relies on the Young’s inequality: For any \(a,b>0\) we have \[ ab\leq \frac{a^p}{p}+\frac{b^q}{q}, \] with equality occuring iff: \(a^p=b^q\). To prove the Young’s inequality, one can use the (strict) convexity of the exponential function which tels us that for any reals \(x,y\), then
\[ e^{\frac{x}{p}+\frac{y}{q} }\leq \frac{e^{x}}{p}+\frac{e^{y}}{q}. \] We then set \(x=p\ln a\) and \(y=q\ln b\) to get the Young’s inequality. A good reference on the inequalities topic is: Z. Cvetkovski, Inequalities: theorems, techniques and selected problems, 2012, Springer Science & Business Media.
Note that the triangular inequality for the Minkowski distance implies
\[ \sum_{i=1}^n |x_i|\leq \left[\sum_{i=1}^n |x_i|^{p}\right]^{1/p} ,\:p\geq 1. \]
Note that for \(p=2\), we have \(q=2\). The Hölder inequality implies for that special case \[ \sum_{i=1}^n|x_iy_i|\leq\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}. \]
Since the LHS od thes above inequality is greater then \(|\sum_{i=1}^nx_iy_i|\), we get the Cauchy-Schwartz inequality \[ |\sum_{i=1}^nx_iy_i|\leq\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}. \] Using the dot product noation called also scalar product noation \(\mathbf{x\cdot y}=\sum_{i=1}^nx_iy_i\), and the norm notation \(\|\mathbf{\cdot}\|_2 \|\), the Cauchy-Schwart inequality is:
\[|\mathbf{x\cdot y} | \leq \|\mathbf{x}\|_2 \| \mathbf{y}\|_2 \]
The Pearson correlation coefficient is a similarity measure on \(\mathbb{R}^n\) defined by: \[ \rho(\mathbf{x},\mathbf{y})= \frac{\sum_{i=1}^n (x_i-\bar{\mathbf{x}})(y_i-\bar{\mathbf{y}})}{{\sqrt{\sum_{i=1}^n (x_i-\bar{\mathbf{x}})^2\sum_{i=1}^n (y_i-\bar{\mathbf{y}})^2}}}, \] where \(\bar{\mathbf{x}}\) is the mean of the vector \(\mathbf{x}\) defined by: \[\bar{\mathbf{x}}=\frac{1}{n}\sum_{i=1}^n x_i,\]
Note that the Pearson correlation coefficient satisfies P2 and is invariant to any positive linear transformation, i.e.: \(\rho(\alpha\mathbf{x},\mathbf{y})=\rho(\mathbf{x},\mathbf{y})\), for any \(\alpha>0\). # Pearson distance
The Pearson distance (or correlation distance) is defined by: \[ d(\mathbf{x},\mathbf{y})=1-\rho(\mathbf{x},\mathbf{y}).\] Note that the Pearson distance does not satisfy \(P1\) since \(d(\mathbf{x},\mathbf{x})=0\) for any non-zero vector \(\mathbf{x}\). It neither satisfies the triangle inequality. However, the symmetry property is fullfilled.
The cosine of the angle \(\theta\) between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is a measure of similarity given by: \[ \cos(\theta)=\frac{\mathbf{x}\cdot \mathbf{y}}{\|\mathbf{x}\|_2\|\mathbf{y}\|_2}=\frac{\sum_{i=1}^n x_i y_i}{{\sqrt{\sum_{i=1}^n x_i^2\sum_{i=1}^n y_i^2}}}. \]
Note that the cosine of the angle between the two centred vectors \((x_1-\bar{\mathbf{x}},\cdots,x_n-\bar{\mathbf{x}})\) and \((y_1-\bar{\mathbf{y}},\cdots,y_n-\bar{\mathbf{y}})\) coincides with the Pearson correlation coefficient of \(\mathbf{x}\) and \(\mathbf{y}\).
The cosine correlation distance is defined by: \[ d(\mathbf{x},\mathbf{y})=1-\cos(\theta). \] It shares similar properties than the Pearson correlation distance. Likewise, Axioms P1 and P3 are not satisfied.
To calculate the Spearman rank-order correlation or Spearman correlation coefficient for short, we need to map seperately each of the vectors to ranked data values: \(\mathbf{x}\rightarrow \mathbf{x}^r=(x_1^r,\cdots,x_n^r)\). Here, \(x_i^r\) is the rank of \(x_i\) among the set of values of \(\mathbf{x}\).
We illustrate this transformation with a simple example. If \(\mathbf{x}=(3, 1, 4, 15, 92)\), then the rank-order vector is \(\mathbf{x}^r=(2,1,3,4,5)\).
The Spearman correlation of two numerical variables \(\mathbf{x}\) and \(\mathbf{y}\) is simply the Pearson correlation of the two correspnding rank-oerder variables \(\mathbf{x}^r\) and \(\mathbf{y}^r\), i.e. \(\rho(\mathbf{x}^r,\mathbf{y}^r)\). This measure is is useful because it is more robust against outliers than the Pearson correlation.
If all the \(n\) ranks are distinct, it can be computed using the following formula:
\[ \rho(\mathbf{x}^r,\mathbf{y}^r)=1-\frac{6\sum_{i=1}^n d_i^2}{n(n^2-1)}, \] where \(d_i=x_i^r-y_i^r,\:i=1,\cdots,n\).
The spearman distance is then defined by: \[ d(\mathbf{x},\mathbf{y})=1-\rho(\mathbf{x^r},\mathbf{y^r}). \] It can be shown that easaly that it is not a proper distance.
If all the \(n\) ranks are distinct, we get: \[ d(\mathbf{x},\mathbf{y})=\frac{6\sum_{i=1}^n d_i^2}{n(n^2-1)}. \] # Example
As an example, consider these data
## [1] -2 0 1 1 0
## [1] 0.7
## [1] 0.7
The Kendall rank correlation coefficient is calculated from the number of correspondances between the rankings of \(\mathbf{x}\) and the rankings of \(\mathbf{y}\).
There are totally \({n \choose 2} =n(n-1)/2\) number of pairs. For any pair \(i,j=1\cdots,n,\:i<j\), the observations \((x_{i},x_{j})\) and \((y_{i},y_{j})\) are said concordant if the signs of \(x_{i}-x_{j}\) and \(y_{i}-y_{j}\) are the same. Otherwise they are said to be discordant. Denoting by \(n_c\) the number of concordant pairs and by \(n_d\) the number of discordant pairs, the Kendall \(\tau\) coefficient is defined as: \[\tau =\frac {n_c-n_d}{n(n-1)/2}.\]