submitted by: Navona
due date: 2019-10-08
last ran: 2019-10-08
website: http://rpubs.com/navona/PSY2002_assignment01


Question 1: PCA, unrotated


a) Evaluate KMO and Bartlett’s tests: do we have evidence to move ahead with the PCA?

The KMO heuristic is used to assess if the correlation matrix of a given dataset expresses enough variation, across enough variables, to proceed with PCA, via a measure called MSA (measure of sampling adequacy). We see that the overall MSA is 0.7362842. A general rule of thumb is that an overall MSA ≥.7 indicates sufficient probable common variance to proceed with PCA. It is also important that ‘enough’ variables express sufficient common variance. We see that the minimum MSA, across 7 variables, is 0.6519145 (associated with the Yuppy variable). A general rule of thumb is that no (or few) variables should fall below an MSA ≤.5. Thus, on both grounds, our data meet the KMO heuristic criteria to proceed with PCA.

Bartlett’s test essentially tests the null hypothesis that the correlation matrix of our dataset is uncorrelated: if we reject the null, we believe there is common variance and we proceed with PCA; if we fail to reject the null, we don’t need to perform a PCA at all. Barlett’s test is typically used in small datasets under N ≤ 100, which is close to our N of 157, so it is an appropriate test here. Our results are (very) significant, with a \(\chi^2\) value of 1301.2328659 and a p-value of 1.316747810^{-262}. This means our correlation matrix is appropriately classed as a correlation matrix and not an identity matrix (the variance in the dataset better fits an ellipse, than a sphere), and our data meet the Bartlett criteria to proceed with PCA.


b) How many variables components have eigenvalues greater than one, and what were their eigenvalues?

We see that 2 components have eigenvalues greater than 1. Their values are 4.0514622 and 2.0839173, respectively.


c) How much of the total variance in the variables was accounted for in those components with eigenvalues >1? Does the Scree plot support a decision to use only those components with eigenvalues >1?

We can determine that the combined first 2 components account for a cumulative proportion of 0.87648 of the variance (specifically, PC1 is responsible for 0.57878 and PC2 is responsible for 0.2977). Thus, these two orthogonal PCs carry the majority of variability.

The scree plot visualization shows a steep descent, with the ‘elbow’ of the plot falling on PC3, which suggests that we employ only the first 2 PCS. In this case, the scree plot supports Kaiser’s criterion to employ only those PCs with an eigenvalue > 1. (Kaiser’s criterion is valid here, as the number of variables is <30, and communalities are >.7). We can further validate our decision to keep only the first 2 PCs as they together account for >70% of the variance, which is a third criteria for deciding the number of components to retain. Moreover, keeping too few components is genenerally preferable to keeping too many (to avoid overfitting).


d) Referring to the “Component Plot” in an unrotated model can help determine whether rotation could help distinguish factors to a greater extent. In this case, would the analysis benefit if the component axes were rotated, or are the variables hovering nicely (and in a separated manner) around each component axis?

The component plot, or bi-plot, of the unrotated model shows a clear separation between factors. (Because we have standardized the variables to unit norm, all vectors have the same length; the structure of the PCA is indicated by the direction, or angle, of the vectors.) Specifically, the Leafs factor (defined by the presence of Income) and the Flames factor (defined by the presence of Kind2Mom) lie on separate axes.

However, it is almost always the case, by definition, that rotation improves separation between factors. Here, we can visually imagine that if the axes were rotated slighly clockwise, the model would more closely intersect /better separate the PC1 (Flames) and PC2 (Leafs) variables.



Question 2: PCA, Varimax rotation


a) What do rotations attempt to do?

Rotation is primarily intended to improve simplicity of a given model. Rotated factors are “simple”, in that rotation generally ensures all variables’ factor loadings fall closer to |1| or 0 than unrotated factors. Different criteria for simplicity lead to different methods of rotation.

As an upshot of simplicity, rotation improves interpretability, as factor loading close to |1| are interpreted as important, whilst those close to 0 are deemed unimportant. Note that rotation does not improve the fit between the data and the factor structure, i.e., any rotated factor solution explains exactly as much correlation in the data as the initial solution. (By extension, there is no such thing as a ‘best’ rotation from a statistical point of view: the choice between rotations is made on non-statistical grounds.) Simply, different rotations (may) give rise to different interpretations of the same data, and the selection of a rotation method and its interpretation is driven by theoretical considerations.


b) Is Varimax orthogonal or oblique?

Varimax is an orthogonal rotation, first described by Kaiser. The Varimax works such that each component tends to load highly on a small number of variables (a small number of high loadings), and low on the others (a large number of small loadings). Because Varimax is orthogonal, factors remain uncorrelated.


c) Compare the values for cumulative percentage of variance explained in the rotated solution and the non-rotated solution. Are they the same or different?

After rotation, the combined first 2 rotated components (RC1 and RC2) account for a cumulative proportion of 0.8764828 of the variance, which is identical to the cumulative proportion accounted for by PC1 and PC2 in the unrotated solution. However, we see that proportional variance differs between PCs and RCs: the proportional variance of RC1 is 0.465387 (cf 0.57878), and RC2 is 0.4110957 (cf 0.2977).


d) Which variables load on the “Leafs” factor, and which load on the “Flames” factor? Which variables cross-load? Are the loadings positive or negative and what does that indicate in the context of the factors that you have extracted?

Leafs
Flames
RC1 RC2
Income 0.9685527 0.0638805
Yuppy 0.9506266 -0.0956995
CarFancy 0.9016751 0.3309342
Kind2Mom -0.0382254 0.9203649
Dwn2Erth 0.2832318 0.8496021
HardWork 0.1024243 0.9389123
Luv4Team 0.7146564 0.5517808


We see that Income, Yuppy, and CarFancy clearly load on the Leafs factor, and Kind2Mom, Down2Erth, and HardWorkclearly load on the Flames factor. The Luv4Team is ‘cross-loaded’ on both factors, suggesting that both team’s fans love their team (here, I am defining cross loading as a weight ≥ .35 on both factors; there is controvery about this threshold).

The components table reveals two negative loadings. The variable Kind2Mom loads negatively on the Leafs factor (-0.0382254), which suggests that Leafs fans are not particularly nice to their mothers. The Yuppy factor loads negatively on the Flames factor (-0.0956995), which suggests that Flames fans are not Yuppies (indeed it is well-known that most Calgarians are NIMBY’s).


e) Describe the differences you see in loading patterns before and after the rotation. Has the rotation improved the interpretability of the loading pattern (i.e. do variables that were cross-loaded before the rotation appear to load more on one factor or another after rotation)? Do the axes in the rotated component plot appear to fit the variables better or worse than before?

Leafs
Flames
PC1 PC2
Income 0.3918219 0.3919717
Yuppy 0.3345869 0.4694674
CarFancy 0.4504289 0.2196597
Kind2Mom 0.2757570 -0.5092584
Dwn2Erth 0.3767808 -0.3299598
HardWork 0.3355815 -0.4572980
Luv4Team 0.4483530 0.0192103

The orthogonal Varimax rotation has helped interpretability. In the rotated solution, the first two components show higher correlations between the variable and the component, i.e., the rotated solution shows values that are closer to |1| or 0. Specifically, only a total of 2 rotated weights fall between |.3 : .7|, considered to be a weak loading. In contrast, the unrotated solution shows 8 weights in the same range.

Note also that the rotation has decreased the number of observed crossloadings. In the rotated solution, only 1 variables are ‘cross-loaded’ (again, defined as weights >|.35| on both examined factors). In contrast, the unrotated solution expresses 1 weights in the same range. It follows that the absolute value of the difference between the absolute values of the weights is greater in the rotated solution (5.0460634) than the unrotated solution (3.0795777), which represents an improvement to interpretability.

Another consideration is improved reliability of the factors. Reliability is typically assessed with the absolute magnitude and number of loadings: components with at least 4 loads > |.6|, or 3 loadings > |.8|, are reliable. On this basis, neither PC1 nor PC2 are reliable. However, both RC1 and RC2 are.

A final way of evaluation the rotation is via visualization of the ‘biplot’, which shows both individuals and variables. The left and bottom axes show RC scores; the top and right axes show weights. Here, we see that the axes in the rotated component plot appear to more closely intersect / better separate RC1 from RC2.


Question 3: Factor Analysis, Varimax rotation


a) How do PCA and FA use correlation matrices differently during computation? How is this reflected in the total variance explained?

PCA and FA use different correlation matrices during computation. Specifically, the values on the diagonal (“communalities”) differ. The PCA matrix shows “total variance”, i.e., values of 1; in contrast, the FA matrix shows estimates of “common variance” , that fall below 1. This difference reflects a theoretical difference between PCA and FA: PCA assumes that all variability is common, and all unique sources of variability are 0. In contrast, FA assumes there is measurement error, and thus its communalities express variation accounted for by the common factor, but not that attributed to the unique factor.

PCA - unrotated PCA - varimax PAF - varimax
total variance explained 0.87648 0.8764828 0.8299304
component 1 0.57878 0.4653870 0.4468735
component 2 0.29770 0.4110957 0.3830570

This difference is reflected in differences in total variance explained between our PCA and FA models. In our dataset, we see that the first two components of both PCA solutions (with and without rotation) explain a proportion of 0.8764828 of variance. In contrast, the FA solution explains slightly less, at 0.8299304. The difference is attributable to the FA model’s incorporation of a measurement error estimate.


b) In relation to the Rotated loading pattern from Question #2, has using PAF changed your loading pattern to any great extent, or has the general loading pattern remained similar? If there are differences between the loading matrices, note which variables load differently. You can also refer to the component plot for the rotated solution as well if it helps to tell similarities and differences from the solution in Question #2, but this is not necessary to answer the question.

Leafs
Flames
PA1 PA2
Income 0.9677604 0.0729750
Yuppy 0.9309150 -0.0808345
CarFancy 0.8859563 0.3443202
Kind2Mom -0.0236946 0.8683327
Dwn2Erth 0.2752656 0.7964204
HardWork 0.0990509 0.9322881
Luv4Team 0.6737130 0.5417882


As displayed in the visualization below, the factor loading pattern is very similar between PCA and PAF, both with varimax rotation (the loading pattern of PCA without rotation is included for comparison). This similarity makes sense: selecting PCA vs. FA should ultimately weild little/no difference to the conclusion of the analysis, as long as the number of variables included is moderately large (>30), and the FA analysis contains virtually no variables expected to have low communalities (e.g., .4). Though the first of these conditions isn’t met here, our data is clearly sufficient to realize this general rule.


Question 4: Factor Analysis, Promax rotation


a) Are there sizable differences between loadings in the “Pattern” and “Structure” matrices? If so, where are the biggest differences?

In oblique rotations (such as Promax), the factor and structure pattern matrices are distinct. Specifically, the pattern matrix holds the loadings, which are analogous to standardize regression coefficients from a multiple regression analysis. A given element indicates the importance of that variable to the factor, with the influence of the other variables partialled out. The structure matrix holds simple correlations between the variables with the factors. (Note that in orthological rotations, such as Varimax, the loading and correlations are indistinct.)

In our data, we see large differences between the pattern and structure matrices (a difference matrix visualized for ease in the rightmost plot). Specifically, the differences, from largest to smallest, are: Income (PA2), Yuppy (PA2), HardWork (PA1), Kind2Mom (PA1), CarFancy (PA2), Dwn2Erth (PA1), Luv4Team (PA2), Luv4Team (PA1), Yuppy (PA1), Kind2Mom (PA2), CarFancy (PA1), Dwn2Erth (PA2), Income (PA1), HardWork (PA2).


b) What do the differences suggest about the level of relation between the factors? Confirm your argument by calculating the correlation between factors.

These differences between the structure and pattern matrices exist because oblique rotation allows for correlated factors (unlike orthogonal rotation, in which correlation between the factors is equal to 0). The pattern and structure matrices are linked by the correlation matrix (i.e., pattern %*% correlation = structure). The off-diagonal term in this matrix, i.e., the correlation between PA1 and PC2, is r = 0.3676361.