Factor Analysis (Food-Texture Data Set)

Applying Factor Analysis in R

The `factanal()` function

In addition to the data set the `factanal()` function requires an estimate of the number of factors (factanal(data, factors = n)). This is a tricky aspect of factor analysis. If we have a hypothesis about the latent variables we may start with an informed guess. If we do not have any clue about the number of factors and the number of variables in the data set is not too large, one may simply try out several values for initializing the model. Another, more sophisticated approach is to use principal components analysis to get a good initial estimate of the number of factors.

In this simple example we just make a guess and set the number of factor to be 2. Further, we keep the defaults for the scores (score = “none”) and the rotation (rotation = “varimax”).

food.fa <- factanal(food[,-1], factors = 2)
food.fa

## 
## Call:
## factanal(x = food[, -1], factors = 2)
## 
## Uniquenesses:
##      Oil  Density   Crispy Fracture Hardness 
##    0.334    0.156    0.042    0.256    0.407 
## 
## Loadings:
##          Factor1 Factor2
## Oil      -0.816         
## Density   0.919         
## Crispy   -0.745   0.635 
## Fracture  0.645  -0.573 
## Hardness          0.764 
## 
##                Factor1 Factor2
## SS loadings      2.490   1.316
## Proportion Var   0.498   0.263
## Cumulative Var   0.498   0.761
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 0.27 on 1 degree of freedom.
## The p-value is 0.603

The model output starts with the function call to remind us on the specifications of our function call.

The first chunk provides the uniquenesses, which range from 0 to 1. The uniqueness, sometimes referred to as noise, corresponds to the proportion of variability, which can not be explained by a linear combination of the factors.

A high uniqueness for a variable indicates that the factors do not account well for its variance.

food.fa$uniquenesses

##       Oil   Density    Crispy  Fracture  Hardness 
## 0.3338599 0.1555255 0.0422238 0.2560235 0.4069459

The next section is the loadings, which range from −1 to 1.The loadings are the contribution of each original variable to the factor. Variables with a high loading are well explained by the factor. Notice there is no entry for certain variables. That is because R does not print loadings less than 0.1. This is meant to help us spot groups of variables.
By squaring the loading we compute the fraction of the variable’s total variance explained by the factor (Teetor 2011). This proportion of the variability is denoted as communality. Another way to calculate the communality is to subtract the uniquenesses from 1. An appropriate factor model results in low values for uniqueness and high values for communality.

apply(food.fa$loadings^2,1,sum) # communality

##       Oil   Density    Crispy  Fracture  Hardness 
## 0.6661398 0.8444745 0.9577762 0.7439766 0.5930539

1 - apply(food.fa$loadings^2,1,sum) # uniqueness

##       Oil   Density    Crispy  Fracture  Hardness 
## 0.3338602 0.1555255 0.0422238 0.2560234 0.4069461

The table beneath the loadings shows the proportion of variance explained by each factor. The row Cumulative Var gives the cumulative proportion of variance explained. These numbers range from 0 to 1. The row Proportion Var gives the proportion of variance explained by each factor, and the row SS loadings gives the sum of squared loadings. This is sometimes used to determine the value of a particular factor. A factor is worth keeping if the SS loading is greater than 1 (Kaiser’s rule).
The last section of the function output shows the results of a hypothesis test. The null hypothesis, H0, is that the number of factors in the model, in our example 2 factors, is sufficient to capture the full dimensionality of the data set. Conventionally, we reject H0 if the p-value is less than 0.05. Such a result indicates that the number of factors is too small. In contrast, we do not reject H0 if the p-value exceeds 0.05.

The residual matrix

Lambda <- food.fa$loadings
Psi <- diag(food.fa$uniquenesses)
S <- food.fa$correlation
Sigma <- Lambda %*% t(Lambda) + Psi
## We now subtract the fitted correlation matrix, (Sigma) from our observed correlation matrix (S). We also round the result to 6 digits.
round(S - Sigma, 6)

##                Oil   Density    Crispy  Fracture  Hardness
## Oil       0.000000  0.000001 -0.002613 -0.018220 -0.000776
## Density   0.000001  0.000000 -0.001081 -0.007539 -0.000320
## Crispy   -0.002613 -0.001081  0.000000  0.000000  0.000005
## Fracture -0.018220 -0.007539  0.000000  0.000000  0.000033
## Hardness -0.000776 -0.000320  0.000005  0.000033  0.000000

The resulting matrix is called the residual matrix. Numbers close to 0 indicate that our factor model is a good representation of the underlying concept.

Interpretation of the factors

The purpose of a rotation is to produce factors with a mix of high and low loadings and few moderate-sized loadings. The idea is to give meaning to the factors, which helps interpret them. From a mathematical viewpoint, there is no difference between a rotated and unrotated matrix. The fitted model is the same, the uniquenesses are the same, and the proportion of variance explained is the same.

Let us fit three factor models, one with no rotation, one with varimax rotation, and one with promax rotation, and make a scatter plot of the first and second loadings.

food.fa.none <- factanal(food[,-1], factors = 2, rotation = "none")
food.fa.varimax <- factanal(food[,-1], factors = 2, rotation = "varimax")
food.fa.promax <- factanal(food[,-1], factors = 2, rotation = "promax")

par(mfrow = c(1,3))
plot(food.fa.none$loadings[,1], 
     food.fa.none$loadings[,2],
     xlab = "Factor 1", 
     ylab = "Factor 2", 
     ylim = c(-1,1),
     xlim = c(-1,1),
     main = "No rotation")
abline(h = 0, v = 0)

plot(food.fa.varimax$loadings[,1], 
     food.fa.varimax$loadings[,2],
     xlab = "Factor 1", 
     ylab = "Factor 2", 
     ylim = c(-1,1),
     xlim = c(-1,1),
     main = "Varimax rotation")

text(food.fa.varimax$loadings[,1]-0.08, 
     food.fa.varimax$loadings[,2]+0.08,
      colnames(food),
      col="blue")
abline(h = 0, v = 0)

plot(food.fa.promax$loadings[,1], 
     food.fa.promax$loadings[,2],
     xlab = "Factor 1", 
     ylab = "Factor 2",
     ylim = c(-1,1),
     xlim = c(-1,1),
     main = "Promax rotation")
abline(h = 0, v = 0)

Now comes the tricky aspect in factor analysis: Interpreting the factors themselves. If two variables both have large loadings for the same factor, then we know they have something in common. As a researcher we have to understand the data and its meaning in order to give a name to that common ground. Taking a look on the figures above is appears that factor 1 accounts for pastry, which is dense and can be bend a lot before it breaks. Whereas factor 2 accounts for pastry that crispy and hard to break apart. So if we need to names these factors we would probably call them soft pastry (factor 1) and hard pastry (factor 2).

Factor Analysis (Food-Texture Data Set)

Aryansh Gupta

16/02/2021

We’ll Use Food-Texture Data Set for the Analysis

Applying Factor Analysis in R

The `factanal()` function

In this simple example we just make a guess and set the number of factor to be 2. Further, we keep the defaults for the scores (score = “none”) and the rotation (rotation = “varimax”).

The residual matrix

Interpretation of the factors

Let us fit three factor models, one with no rotation, one with varimax rotation, and one with promax rotation, and make a scatter plot of the first and second loadings.

Factor Analysis (Food-Texture Data Set)

Aryansh Gupta

16/02/2021

We’ll Use Food-Texture Data Set for the Analysis

Applying Factor Analysis in R

The factanal() function

In this simple example we just make a guess and set the number of factor to be 2. Further, we keep the defaults for the scores (score = “none”) and the rotation (rotation = “varimax”).

The residual matrix

Interpretation of the factors

Let us fit three factor models, one with no rotation, one with varimax rotation, and one with promax rotation, and make a scatter plot of the first and second loadings.

The `factanal()` function