Principal Component Analysis (PCA) is a method by which we can reduce the number of variables in a data set to down fewer variables which contain most of the information the original variables had; i.e., the principal components. It’s often nice to have a simpler data set, although this may reduce the accuracy of the model.
So, it’s basically a nice thing to try if you have a-lot of extra variables and want to simplify things without completely getting rid of them.
PCA works by combining all of the variables into new terms which are combinations of the old ones, from which you only pick the best ones. This way you still keep elements of all the original variables and end up with fewer in the end. As a bonus, this can deal with multicollinearity; all of the final variables will be independent of each-other.
Without going into too much detail of how PCA works mathematically, first thing to do is standardize all of the variables (i.e., subtract the mean and divide by the standard deviation). Second thing is to make a matrix of how variables relate to one another (co-variance matrix. Next, one computes the eigenvalues and vectors of this matrix and then sorts them from greatest to least by eigenvalue. One can now choose how many of the principal components (eigenvectors) to keep; the ones with the highest eigenvalues contain the most information. Last, one applies the new feature vector (selected principal components) onto the standardized standardized data set by multiplying the transpose of the vector to the transpose of the data set.
Luckily, there are functions to help with this: the built-in functions princomp() and prcomp() in R, and the sklearn PCA() function in Python (though I’m sure both languages have many more in other packages).
https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
https://builtin.com/data-science/step-step-explanation-principal-component-analysis