Sensitivity analysis of Principal Component Analysis (PCA) results is an important step in understanding the stability of the analysis and the robustness of the results. In this paper, some factors to consider when applying sensitivity analysis of PCA results will be discussed.
Sample size: The size of the sample used in the PCA analysis can affect the sensitivity of the results. larger sample size generally results in more accurate results as the analysis is less likely to be affected by outliers or noise. It’s important to understand that a large sample should contain meaningful information about the data. Otherwise, it can be harmful to PCA analysis.
Choice of variables: The variables selected for the analysis is very important. It can affect the sensitivity of the results. Selecting variables that are highly correlated with each other can lead to improper results. For example if two variables are leaking information to each other, using those two variables will not contribute to PCA results. Reaching high variance between variables is always very important.
Scaling method: Scaling is very significant because the different range and distribution of variables may lead to wrong results. The scaling method used in PCA can impact the sensitivity of the results. Different scaling methods, such as standardisation and variance scaling, can result in different outcomes.
Number of principal components: The number of principal components selected for a PCA analysis is an important factor to consider because it can impact the sensitivity of the results. In PCA, the goal is to reduce the dimensionality of the data by extracting the most important information from the original variables and representing it in a smaller set of variables, called principal components. Selecting too few principal components can result in an oversimplified representation of the data, where important patterns and variations in the data are not fully captured. On the other hand, selecting too many principal components can lead to overfitting, where the model is too complex and may capture noise or random variations in the data, leading to poor generalisation and predictive performance. Therefore, the number of principal components should be chosen carefully based on a trade-off between the amount of variance explained by the components and the complexity of the model. In practice, the number of principal components is often determined by using a scree plot or other techniques, such as cross-validation, that help identify the optimal number of components that capture most of the variability in the data while avoiding overfitting.
Rotation method: Rotation method in PCA is a technique used to transform the principal components obtained from the initial extraction step into a new set of components that are easier to interpret. The goal of rotation is to produce a small number of new components that retain most of the variance from the initial set of components, while also having high correlations with only a small number of original variables. The rotation method involves the transformation of the axes of the principal components to a new set of orthogonal axes. This transformation is done by applying a rotation matrix to the initial set of components, resulting in a new set of components that are uncorrelated with each other. The two most common rotation methods used in PCA are varimax and oblimin. Varimax rotation produces orthogonal factors, which are easier to interpret because they are independent of each other. Oblimin rotation, on the other hand, produces correlated factors, which are more realistic because in most cases, the factors in the real world are not completely independent of each other.
Interpretation of results: The interpretation of the results is important when applying sensitivity analysis. It is essential to assess the robustness of the results and determine whether they are consistent with the research questions and hypotheses.
In conclusion, conducting a sensitivity analysis of PCA results is crucial to validate the robustness and reliability of the analysis. By taking into account the various factors that can affect the sensitivity of the results, such as sample size, choice of variables, scaling method, number of principal components, and rotation method, researchers can improve the accuracy and consistency of their findings. Sensitivity analysis can also provide insights into the underlying data structure, and help to identify potential outliers or noise that may be affecting the results. By addressing these factors through sensitivity analysis, researchers can have greater confidence in the validity of their PCA results, and draw more accurate conclusions about the relationships and patterns within their data.
Sinan Xiao, Zhenzhou Lu, Liyang Xu,(2017) Multivariate sensitivity analysis based on the direction of eigen space through principal component analysis, Reliability Engineering & System Safety,165,(1-10), https://doi.org/10.1016/j.ress.2017.03.011.
Krzanowski, W. J. (1984). Sensitivity of Principal Components. Journal of the Royal Statistical Society. Series B (Methodological), 46(3), 558–563. http://www.jstor.org/stable/2345693