Principal component analysis and Factor analysis

Occasionally there are lot of variables in the data, which create problem for analyzing data and coming down to some inferences. Hence, it is often necessary to reduce dimension of the dataset to get appropriate, meaningful and valid results. Both are data reduction techniques-that allow you to capture the variance in variables in a smaller set. Both the techniques described below briefly

I. Principal Component Analysis

Principal components analysis (PCA) is one of a family of techniques for taking high-dimensional data, and using the dependencies between the variables to represent it in a more tractable, lower dimensional form, without losing too much information. PCA is one of the simplest and most robust ways of doing such dimensionality reduction. PCA’s approach to data reduction is to create one or more index variables from a larger set of measured variables. It does this using a linear combination (weighted average) of a set of variables. The created index variables called components.
It is quite likely that first few principal components account for most of the variability in the original data. If so, these few principal components can then replace the initial p variables in subsequent analysis, thus, reducing the effective dimensionality of the problem.
The picture below shows what a PCA is doing to combine four measured (Y) variables into a single component, C. You can see from the direction of the arrows that the Y variables contribute to the component variable. The weights allow this combination to emphasize some Y variables more than others do.

Steps to performing principal components analysis

Take the whole dataset consisting of d-dimensional samples ignoring the class labels.
Compute the d-dimensional mean vector (i.e., the means for every dimension of the whole dataset).
Compute the scatter matrix (alternatively, the covariance matrix) of the whole data set.
Compute eigenvectors (e1,e2,…,ed) and corresponding eigenvalues (??1,??2,…,??d).
Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d . k dimensional matrix W (where every column represents an eigenvector).
Use this d . k eigenvector matrix to transform the samples onto the new subspace. This summarized by the mathematical equation: y=W^T . x (where x is a d.1-dimensional vector representing one sample, and y is the transformed k.1-dimensional sample in the new subspace.)

II. Factor Analysis

Factor analysis is a class of procedures used for data reduction and summarization. It is an interdependence technique: no distinction between dependent and independent variables. Factor analysis used: To identify underlying dimensions, or factors, that explain the correlations among a set of variables. It is a model of the measurement of a latent variable. This latent variable measured with multiple variables. Seen through the relationships it causes in a set of Y variables.

For example, it will be difficult to measure social soil health. However, we can measure whether soil health is high or low with a set of variables like Chemical indicators for soil health :" What is the pH level of the soil.?" or " what are the nutrient levels?." , Physical indicators :“what is the texture of the soil?” or “What is the porosity of the soil.?” Similarly, there are Biological Indicators, Microbiological and biochemical indicators etc…

The measurement model for a simple, one-factor model looks like the diagram below. It is counter intuitive, but F, the latent Factor, is causing the responses on the four measured Y variables. Therefore, the arrows go in the opposite direction from PCA. Just like in PCA, the relationships between F and each Y are weighted, and the factor analysis is figuring out the optimal weights. In model we have is a set of error terms, designated by the u(s). The variance in each Y that is unexplained by the factor.

There are two types of factor analyses: exploratory factor analysis (or EFA) and confirmatory factor analysis (or CFA).
a) Exploratory factor analysis (EFA): It attempts to discover the nature of the constructs, influencing a set of responses.
b) Confirmatory factor analysis (CFA): It tests whether a specified set of constructs is influencing responses in a predicted way.

Good Reads & References :-
1. https://stats.stackexchange.com/questions/1576/what-are-the-differences-between-factor-analysis-and-principal-component-analysis
2. https://machinelearningmastery.com

Blog 1:- Principal component Analysis and Factor Analysis

Vishal Arora

December 11, 2019

Principal component analysis and Factor analysis

I. Principal Component Analysis

Steps to performing principal components analysis

II. Factor Analysis