PSY460: Advanced Quantitative Methods

Week #8: Factor Analysis

Today, we’ll discuss the conceptual underpinnings of exploratory factor analysis, and we’ll run a factor analysis on a new dataset. We will also check whether our factors have sufficient internal consistency. Finally, we’ll discuss creating new variables and other critical steps in the data cleaning process.

“Quiz”

  • Write two sentences describing what you learned about factor analysis from the assigned reading and video.
  • Write down two questions you have about factor analysis.
  • Reflect on your performance in the course since our one-on-one meeting. What have you done well, and what you could be working to improve?

What is factor analysis?

  • A “factor” is an unobserved (or “latent”) variable (e.g., IQ) that is indicated by a set of observed measures (e.g., scores on Woodcock-Johnson tests).
    • Factors are identified by a common, underlying source of variance across multiple measures, thus ignoring variance from idiosyncratic aspects of each measure (including measurement error).
  • Typically, exploratory factor analysis (EFA) is conducted prior to confirmatory factor analysis (CFA). For your purposes, you will only use EFA.

Factor extraction

  • There are different ways of “extracting” factor solutions from the data.
    • Maximum likelihood (ML) extraction is a common way of estimating how well the factor solution from the EFA is able to account for the actual relationships among the indicator variables.

Factor rotation

  • There are also different mathematical transformations that can be applied to the factor loadings to “rotate” the solution in multidimensional space for improving the interpretability of results.
    • Oblique rotations, which allow for correlations between factors, are more realistic.
      • Promax rotation is a common oblique rotation method.

Factor rotation

Factor selection

  • EFA does not always produce a clear-cut indication of how many latent factors exist for a given set of variables.
    • Various techniques, such as parallel analysis, can provide a rough estimation of how many factors exist in an optimal solution.
    • However, subjective considerations of interpretability can be used to override more objective considerations.

Internal consistency

  • In general, the items in each factor should have a substantial amount of shared variance. This indicates a form of reliability called “internal consistency”.
    • This form of reliability is typically indexed by a measure called Cronbach’s alpha.
      • As a general rule of thumb, the alpha coefficient should be above .70.

Questions posed to participants about targets’ moral character

  • Would you consider [name] to have strong moral integrity?
  • Do you think that [name] is trustworthy?
  • How interested would you be in pursuing a friendship or business partnership with [name]?
  • How warm do you feel toward [name]?
  • Do you expect [name] to act in [respectful, humane, etc.] ways in the future?

Packages needed for today’s analyses

#
library(tidyverse) # This gives us access to dplyr. 
library(psych) # This allows us to perform factor analysis.
library(DescTools) # This will let us compute Cronbach's alpha.
library(magrittr) # This allows us to use the double pipe.

TidyData <- read.csv('TidyData.csv', header = TRUE,
                     stringsAsFactors = FALSE) # Here's the data we'll use.

Determining the number of factors

#
FA_variables <- select(TidyData, IntegrityChange:FutureBehChange)

nofactors = fa.parallel(FA_variables, fm = "ml", fa = "fa")

Determining the number of factors

Parallel analysis suggests that the number of factors =  2  and the number of components =  NA 

Performing a factor analysis: Two factors

#
FA.2F <- fa(r = FA_variables, 
            nfactors = 2,
            rotate = "promax",
            fm = "ml",
            residuals = TRUE)

print(FA.2F, sort = TRUE)

Performing a factor analysis: Two factors

Factor Analysis using method =  ml
Call: fa(r = FA_variables, nfactors = 2, rotate = "promax", residuals = TRUE, 
    fm = "ml")
Standardized loadings (pattern matrix) based upon correlation matrix
                  item  ML2  ML1   h2   u2 com
IntegrityChange      1 0.83 0.04 0.76 0.24 1.0
TrustworthyChange    2 0.62 0.26 0.73 0.27 1.4
FutureBehChange      5 0.47 0.20 0.42 0.58 1.3
PartnerChange        3 0.06 0.87 0.84 0.16 1.0
WarmChange           4 0.28 0.64 0.78 0.22 1.4

                       ML2  ML1
SS loadings           1.83 1.71
Proportion Var        0.37 0.34
Cumulative Var        0.37 0.71
Proportion Explained  0.52 0.48
Cumulative Proportion 0.52 1.00

 With factor correlations of 
     ML2  ML1
ML2 1.00 0.84
ML1 0.84 1.00

Mean item complexity =  1.2
Test of the hypothesis that 2 factors are sufficient.

df null model =  10  with the objective function =  3.42 with Chi Square =  6974.54
df of  the model are 1  and the objective function was  0 

The root mean square of the residuals (RMSR) is  0 
The df corrected root mean square of the residuals is  0 

The harmonic n.obs is  2044 with the empirical chi square  0.01  with prob <  0.92 
The total n.obs was  2044  with Likelihood Chi Square =  0.05  with prob <  0.82 

Tucker Lewis Index of factoring reliability =  1.001
RMSEA index =  0  and the 90 % confidence intervals are  0 0.036
BIC =  -7.57
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   ML2  ML1
Correlation of (regression) scores with factors   0.94 0.95
Multiple R square of scores with factors          0.88 0.90
Minimum correlation of possible factor scores     0.76 0.81

Performing a factor analysis: One factor

#
FA.1F <- fa(r = FA_variables, 
            nfactors = 1,
            rotate = "promax",
            fm = "ml",
            residuals = TRUE)

print(FA.1F, sort = TRUE)

Performing a factor analysis: One factor

Factor Analysis using method =  ml
Call: fa(r = FA_variables, nfactors = 1, rotate = "promax", residuals = TRUE, 
    fm = "ml")
Standardized loadings (pattern matrix) based upon correlation matrix
                  V  ML1   h2   u2 com
WarmChange        4 0.89 0.79 0.21   1
PartnerChange     3 0.88 0.77 0.23   1
TrustworthyChange 2 0.85 0.72 0.28   1
IntegrityChange   1 0.82 0.68 0.32   1
FutureBehChange   5 0.65 0.42 0.58   1

                ML1
SS loadings    3.37
Proportion Var 0.67

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

df null model =  10  with the objective function =  3.42 with Chi Square =  6974.54
df of  the model are 5  and the objective function was  0.05 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.03 

The harmonic n.obs is  2044 with the empirical chi square  19.04  with prob <  0.0019 
The total n.obs was  2044  with Likelihood Chi Square =  105.22  with prob <  4.2e-21 

Tucker Lewis Index of factoring reliability =  0.971
RMSEA index =  0.099  and the 90 % confidence intervals are  0.083 0.116
BIC =  67.11
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   ML1
Correlation of (regression) scores with factors   0.96
Multiple R square of scores with factors          0.93
Minimum correlation of possible factor scores     0.85

Identifying potential factors

#
GoodCharacterChange <- select(TidyData, IntegrityChange, 
                              TrustworthyChange, FutureBehChange)

AttractivePartnerChange <- select(TidyData, PartnerChange, WarmChange)

GoodCoopPartnerChange <- select(TidyData, IntegrityChange:FutureBehChange)

Internal consistency

CronbachAlpha(GoodCharacterChange)
[1] 0.8275814
CronbachAlpha(AttractivePartnerChange)
[1] 0.8927431
CronbachAlpha(GoodCoopPartnerChange, cond = TRUE)
$unconditional
[1] 0.9080427

$condCronbachAlpha
  Item Cronbach Alpha
1    1      0.8840047
2    2      0.8805512
3    3      0.8777114
4    4      0.8757710
5    5      0.9178407

Creating new variables

#
TidyData %<>% mutate(GoodCoopPartnerChange = 
                       as.numeric(rowMeans(
                         select(., c(IntegrityChange:FutureBehChange)))))

Creating other variables

  • Every dataset must be cleaned in unique ways. However, one common feature is that datasets are almost never set up how you would like them to look for analysis.
    • For example, you may have data spread across different similar but mutually exclusive variables (e.g., different columns for single individuals and for families).
      • In cases this like, you can use the function “coalesce” from dplyr to combine your data across variables, as “coalesce” will create a new variable with the first non-missing value across an array of variables.

Creating other variables

  • In many cases, you will want to create new variables that are conditional upon values of other variables. One option for doing this is to specify a value for each condition, like this:
#
TidyData$Dil [TidyData$Item == "Loyal_Altruistic"] <- "Loyalty Dilemma"
TidyData$Dil [TidyData$Item == "Altruistic_Loyal"] <- "Loyalty Dilemma"
TidyData$Dil [TidyData$Item == "Generous_Frugal"] <- "Generosity Dilemma"
# and so on...

Creating other variables

  • A more streamlined option for creating conditional variables is to make use of functions such as “if_else” or “case_when” from dplyr, like this:
#
TidyData %<>% mutate(Loyal = case_when(Item == "Loyal_Altruistic" & 
                                        Condition == "Lapse" ~ 0,
                                       Item == "Loyal_Altruistic" & 
                                        Condition == "NoLapse" ~ 1,
                                       Item == "Altruistic_Loyal" & 
                                        Condition == "Lapse" ~ 1,
                                       Item == "Altruistic_Loyal" & 
                                        Condition == "NoLapse" ~ 0))

What does your team need to do to produce a cleaned dataset?