This methods tries to explain the correlation structure of a set of predictor variables using a smaller set o linear combinations of these variables called components, note that components are not variables, rather indicators of linear combinations between variables.
Given a dataset with m variables a set of k linear combinations can be used to represent it (meaning that k contains almost as much information as the m variables), also k<<m.
Before starting the process of dimensionality reduction one should make sure the data is standardized, this is done to avoid bised in the results by values to large or to small when compared to each other.
Where: μi equals the mean of Xi and σii equals the standard deviation of Xi.
xi,a=xi,a−μa
This move will facilitate the calculations down the road.
The covariance is a measure of the degree to which two variables vary together. Positive covariance indicates that when one variable increases, the other tends to increase. Negative covariance indicates that when one variable increases, the other tends to decrease. The covariance measure is not scaled.
In a 2x2 matrix: |2.00.80.80.6|
Since the mean (μ) is equal to ∅ thanks to centering the values in the previous step, the formula to calculate the covariance of the values in the matrix is: cov(x1,x2)=1nn∑i=1xi,jxi,2 The way to interpret covariance is to understand it’s results as information about how one attribute changes as the other one changes.
It is important to remember that, if we multiply a vector by the covariance matrix or ∑ the resulting vector will turn towards the direction of the variance.
Changing the units of measure would change the results, this is an inconvenience and is addressed by calculating the correlation coefficient rij:
rij scales the covariance by each of the standard deviations: rij=σ2ijσiiσjj The rij gives us a value with reference to know how much of a correlation exists between two variables.
Define a new set of dimentions by:
σ2=∑(X−μ)2N or σ2=∑X2N−μ2 In the previous formula σ2 is defined as the sum of the squared distances of each term in the distribution from the mean (μ2) divided by the number of terms in the distribution (N). In simple words: σ2 measures how far a set of random numbers are spread out from their mean.
Once PC1 is determined, it will established the next dimension by drawing an orthogonal (perpendicular) line in relation to PC1, the exact area where the line will be drawn is determined by the same process of finding the gratest σ2 of the remaining data, once this is done PC2 is ready.
This will be done iteratively until all the dimensions (d) of the dataset are covered and components (m) are generated for every single d.
Eigenvectors and eigenvalues are mathematically expressed as:
A→v=λ→v Where A represents transformation, →v, a vector, also known as eigenvector, that comes out of the matrix being analysied and λ, a scalar value also known as eigenvalue.
Principal components = eigenvectors with largest eigenvalues.
In order to exemplify the process of finding these values and vector steps are presented for a 2x2 matrix, but this can be done with any matrix of n x n dimensions following the rules of matrix algebra.
To begin with the example we will declare a matrix: A=[733−1]
Now the steps:
Multiply an n x n identity matrix by the scalar λ: IAλ
[1001]∗λ=[λ00λ]
Substract the identity matrix multiple from matrix A: A−λI
[733−1]−[1001]=[7−λ33−1−λ]
Find the determinant of the matrix obtained in previous step: det(A−λI)
det[7−λ33−1−λ]=(7−λ)(−1−λ)−(3∗3) =−7−7λ+λ+λ2=−16−6λ+λ2 =λ2−6λ−16
Solve for the values of λ that satisfy the equation det(A−λI)=0
Solving for λ2−6λ−16=0 will result in: (λ−8)(λ+2)=0
Therfore λ1=8 and λ2=−2 these are the eigenvalues for matrix A.
Solve for the corresponding vector to each λ
Solving for λ=8, in this process we will call the matrix with substituted values: B.
[7−(8)33−1−(8)]=[−133−9]
We will assume the following B¯X=0 ∴
\left[ \begin{array}{cc} -1 & 3 \\ 3 & -9 \end{array} \right] \left[ \begin{array}{cc} x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{cc} 0 \\ 0 \end{array} \right]
Applying row reduction 3R_1 + R_2 = R_2 to: \left[ \begin{array}{cc|r} -1 & 3 & 0\\ 3 & -9 & 0 \end{array} \right] = \left[ \begin{array}{cc|r} -1 & 3 & 0\\ 0 & 0 & 0 \end{array} \right] \space \therefore -x_1+3x_2 = 0
From the previous operation we obtain 3x_2 = x_1, at this point we can choose a value for any x, we will go for x_2 = 1 to keep it simple.
3x_2 = x_1 \space \therefore 3(1) = x_1 \space \therefore \space x_1 = 3
Now we know that the eigenvalue \lambda = 8 $ corresponds to the eigenvector \overline X = (3,1).
Solving for \lambda -2, generating matrix C.
C = \left[ \begin{array}{cc}
7-(-2) & 3 \\
3 & -1-(-2) \end{array} \right] C\overline X = 0 \space \therefore
\left[ \begin{array}{cc} 9 & 3 \\ 3 & 1 \end{array} \right] \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]
Applying row reduction 3R_2 - R_1 = R_1:
\left[ \begin{array}{cc|r} 0 & 0 & 0\\ 3 & 1 & 0 \end{array} \right] \space \therefore 3x_1 + x_2 = 0
Assigning x_1 = 1: x_2 = -3x_1 \space \therefore x_2 = -3(1)
The eigenvalue \lambda = 8 corresponds to the eigenvector \overline X = (1,-3)
In other words, usually the 2 eigenvectors with the highest scalars, or \lambda, will be selected to represent the whole dataset as Principal Component 1 and Principal Component 2.
One or the algoritm has to project the datapoints to these new set of dimensions so they can be analyized.
This algorithm, as all, is better suited for specific circumstances and performs poorly in others. The following list trys to summarize this idea: