## Warning: package 'haven' was built under R version 4.2.3
## Warning: package 'skimr' was built under R version 4.2.3
## Warning: package 'psych' was built under R version 4.2.3
## corrplot 0.92 loaded
## Warning: package 'lavaan' was built under R version 4.2.3
## This is lavaan 0.6-15
## lavaan is FREE software! Please report any bugs.
##
## Attaching package: 'lavaan'
## The following object is masked from 'package:psych':
##
## cor2cov
## Warning: package 'semPlot' was built under R version 4.2.3
In this tutorial, we will conduct both exploratory and confirmatory factor analysis using the nationally representative 2020 American National Election Survey for our data. In this survey, 9 unique political emotions variables were asked of respondents, and we will use these 9 variables to understand how political emotions are structured. Currently, it is widely believed that three latent emotional factors best explain the structure of political emotions (Marcus et al 2006) :
We will use factor analysis to test whether a three factor solution really is the best way to explain these political emotions.
<- anes %>%
df ::select(V201115, V201116, V201117, V201118, V201119, V201120,
dplyr#Save only the variables you want to include in your analysis
V201121, V201122, V201123 ) <= -1] <- NA
df[df
<- c("hope", "afraid", "outrage", "angry", "happy", "worried", "proud", "irritated", "nervous") #Give your variables new informative names
new_names
# Update column names
colnames(df) <- new_names #Apply new names to your data frame
skim(df) #Checks the variables in your data frame; evaluate for missing data
Name | df |
Number of rows | 3000 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
numeric | 9 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
hope | 6 | 1 | 2.52 | 1.17 | 1 | 2 | 3 | 3 | 5 | ▆▇▇▃▂ |
afraid | 4 | 1 | 3.38 | 1.22 | 1 | 3 | 3 | 4 | 5 | ▂▅▇▇▆ |
outrage | 3 | 1 | 3.58 | 1.25 | 1 | 3 | 4 | 5 | 5 | ▂▃▆▇▇ |
angry | 2 | 1 | 3.59 | 1.21 | 1 | 3 | 4 | 5 | 5 | ▂▃▇▇▇ |
happy | 7 | 1 | 1.96 | 1.07 | 1 | 1 | 2 | 3 | 5 | ▇▃▃▁▁ |
worried | 2 | 1 | 3.68 | 1.14 | 1 | 3 | 4 | 5 | 5 | ▁▃▆▇▇ |
proud | 9 | 1 | 2.02 | 1.16 | 1 | 1 | 2 | 3 | 5 | ▇▃▃▂▁ |
irritated | 2 | 1 | 3.84 | 1.13 | 1 | 3 | 4 | 5 | 5 | ▁▂▅▇▇ |
nervous | 1 | 1 | 3.53 | 1.19 | 1 | 3 | 4 | 5 | 5 | ▂▃▇▇▇ |
To start, we read in our data, in this case the 2020 American National Election study. Then we save a new data frame that includes only the nine political emotions variables we want to include in our analysis. We will use this data frame throughout the code. Once we have saved the political emotions variable as a new data frame, we recode all the negative values to NA as these are non-substantive responses which should be removed (as indicated in the associated codebook).
Next, we change the variables names to something that is more informative. This will aid in interpretation of the factor analysis results and should not be skipped. Lastly, we use the skimr package to quickly skim the variables in our dataset. We want to see the minimum value = 1 with no negative values, since negative values should be treated as missing data.
First step is to check the correlations between your variables, political emotions here, to see how related, or not, each of the individual items are. Below, we create a matrix with all the correlations between the individual items and graph the correlations using a heat-map for easier viewing.
#Step 1: Evaluate correlations
<-cor(df, use = "pairwise.complete.obs") #Saves correlation matrix
cor_matrixcorrplot(cor_matrix, method = "circle", type="lower", diag = FALSE) # Plot correlation matrix as a heatmap
<- round(cor_matrix, 3)
cor_matrix cor_matrix
## hope afraid outrage angry happy worried proud irritated nervous
## hope 1.000 -0.420 -0.335 -0.357 0.638 -0.420 0.642 -0.370 -0.436
## afraid -0.420 1.000 0.627 0.648 -0.453 0.751 -0.416 0.604 0.763
## outrage -0.335 0.627 1.000 0.766 -0.417 0.650 -0.394 0.691 0.622
## angry -0.357 0.648 0.766 1.000 -0.423 0.656 -0.400 0.733 0.637
## happy 0.638 -0.453 -0.417 -0.423 1.000 -0.493 0.748 -0.459 -0.479
## worried -0.420 0.751 0.650 0.656 -0.493 1.000 -0.448 0.658 0.776
## proud 0.642 -0.416 -0.394 -0.400 0.748 -0.448 1.000 -0.438 -0.445
## irritated -0.370 0.604 0.691 0.733 -0.459 0.658 -0.438 1.000 0.603
## nervous -0.436 0.763 0.622 0.637 -0.479 0.776 -0.445 0.603 1.000
###At least 2 distinct and possibly 3 distinct factors from examining
Results indicate at least two and probably three distinct factors in the political emotions’ variables. Three factors would match the dominant belief in the literature on how political emotions are structured (Marcus et al 2006). The three positive emotions are clearly positively related to each other with smaller and negative coefficients to the other six emotions. For the negative emotions, all six items are positively and significantly related to one another indicating they might all be measuring the same concept. However, closer examination the results indicates that being afraid is more highly correlated with being worried or nervous than it is anger, outrage, or irritation. That matches the underlying theoretical belief as well that those three items represent an “anxiety about politics” factor whereas anger, outrage, and irritation represent an “aversion” factor towards politics.
Conducting a factor analysis will help us better understand how these 9 individual political emotions are related to one another. We will conduct both exploratory and confirmatory factor analysis to illustrate both methods.
We’ll start the factor analysis with an exploratory factor analysis approach using principal axis factor analysis(paf) before exploring the use of principal component factor analysis (pcf) . Eventually, we’ll compare the results between these two approaches to evaluate differences. PCF approaches are more generally used for data reduction reasons whereas PAF approaches are used to evaluate underlying latent concepts in the data.
#Step 2: Evaluate Screeplot - looking for number of factors >= ~1
scree(df) #from 'psych' package and graphs scree plot for PCF and PAF approaches
The scree plot shows eigenvalues from a PCF and a PAF, non-rotated, factor analysis. The PCF shows two clear factors with a third worth looking into whereas the PAF shows one clear factor with a second factor that is close. Knowing what we know from the correlations we examined, we will start with a three factor solution with our exploratory analysis. If two factors does indeed fit the data better than three, the factor analysis will show that.
First, we will estimate a series of factor analyses to illustrate PCF with different rotation types: None, Orthogonal, & Oblique starting with no rotation.
<- fa(df, nfactors = 3, rotate = "none", fm="pa") #paf model
paf_result_no #Reports same Eigenvalues as reported in Scree Plot paf_result_no
## Factor Analysis using method = pa
## Call: fa(r = df, nfactors = 3, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 h2 u2 com
## hope -0.60 0.44 -0.04 0.55 0.45 1.9
## afraid 0.81 0.18 0.24 0.74 0.26 1.3
## outrage 0.78 0.26 -0.22 0.72 0.28 1.4
## angry 0.81 0.28 -0.27 0.80 0.20 1.5
## happy -0.69 0.51 0.04 0.74 0.26 1.8
## worried 0.84 0.17 0.20 0.77 0.23 1.2
## proud -0.67 0.55 0.08 0.76 0.24 2.0
## irritated 0.78 0.18 -0.20 0.68 0.32 1.2
## nervous 0.83 0.15 0.29 0.79 0.21 1.3
##
## PA1 PA2 PA3
## SS loadings 5.18 1.03 0.35
## Proportion Var 0.58 0.11 0.04
## Cumulative Var 0.58 0.69 0.73
## Proportion Explained 0.79 0.16 0.05
## Cumulative Proportion 0.79 0.95 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 6.46 with Chi Square = 19349.78
## df of the model are 12 and the objective function was 0.02
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.01
##
## The harmonic n.obs is 2993 with the empirical chi square 8.11 with prob < 0.78
## The total n.obs was 3000 with Likelihood Chi Square = 54.24 with prob < 2.5e-07
##
## Tucker Lewis Index of factoring reliability = 0.993
## RMSEA index = 0.034 and the 90 % confidence intervals are 0.025 0.044
## BIC = -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA1 PA2 PA3
## Correlation of (regression) scores with factors 0.98 0.89 0.77
## Multiple R square of scores with factors 0.95 0.79 0.60
## Minimum correlation of possible factor scores 0.91 0.58 0.20
Let’s evaluate the results. The first thing to review is the ‘SS loadings’ row of results. The three values shown in that row are the eigenvalues for the 3 unique factors we specified. The first two factors both have eigenvalues >1 while the third factor’s eigenvalue is below 1. We also want to evaluate the proportion of the variance that each factor explains. Factor 1 clearly explains the most (~58%) while the third factor only adds 4% of additional explained variance.
Next, review the actual factors and see which measures load on which factor (this is why we want to give the variables a name that is intuitive). We see that the six negative emotions all seem to load on Factor 1 while the three positive emotions seem to load on Factor 2. The third factor seems to loosely be related to political anxiety and includes being afraid, worried, and nervous. While Factor 3 is not clearly unique in the unrotated factor analysis, the fact that there are reasonably strong factor loadings indicates that rotation may help to reveal a clearer pattern in the results.
The final items to review are fit statistics. Specifically, we want to examine the Tucker-Lewis Index and the Root Mean Square Error of Approximation (RMSEA). In the TLI analysis, a “good” fitting model will have a result of >=.95. Here, we see a TLI value of .993, which is higher than the traditional cut-point of .95. We will also use these values to asses performance of the 3-factor solution against the 2-factor solution shortly. For the RMSEA result, the traditional cut-point of what is a well fitting factor analysis is less than .05 and here we see a value of .034, which is smaller than the .05 cut-point. Right now, the 3-factor solution looks good, but we still need to do more investigation.
Finally, we can also graph the factor results and note that in the unrotated results all nine emotions load most strongly on Factor 1 even though the positive emotions and negatively related to the negative emotions. This graph takes the absolute value of the factor loadings and matches the highest factor loading for that item to the appropriate latent factor.
fa.diagram(paf_result_no) #Graphs the relationship
Knowing what we saw in the correlation matrix in addition to the
strong factor loadings from the unrotated paf model, we should go ahead
and rotate our factor analysis results. Here we will use an orthogonal
rotation, varimax, which removes all shared variance between the latent
factors. It’s common practice to suppress items with small factor
loadings (<.3), which we do here by adding cut=.3
to the
results display. For instance
print(paf_result_var3, cut = 0.3)
will suppress the factor
loadings from the PAF model using the varimax orthogonal rotation
results.
#####Principal Axis Factor Analysis, 3 factor solution with no rotation, orthogonal (varimax) & oblique (oblimin)
<- fa(df, nfactors = 3, rotate = "none", fm="pa") #paf model
paf_result_no3 #Reports same Eigenvalues as reported in Scree Plot paf_result_no3
## Factor Analysis using method = pa
## Call: fa(r = df, nfactors = 3, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 h2 u2 com
## hope -0.60 0.44 -0.04 0.55 0.45 1.9
## afraid 0.81 0.18 0.24 0.74 0.26 1.3
## outrage 0.78 0.26 -0.22 0.72 0.28 1.4
## angry 0.81 0.28 -0.27 0.80 0.20 1.5
## happy -0.69 0.51 0.04 0.74 0.26 1.8
## worried 0.84 0.17 0.20 0.77 0.23 1.2
## proud -0.67 0.55 0.08 0.76 0.24 2.0
## irritated 0.78 0.18 -0.20 0.68 0.32 1.2
## nervous 0.83 0.15 0.29 0.79 0.21 1.3
##
## PA1 PA2 PA3
## SS loadings 5.18 1.03 0.35
## Proportion Var 0.58 0.11 0.04
## Cumulative Var 0.58 0.69 0.73
## Proportion Explained 0.79 0.16 0.05
## Cumulative Proportion 0.79 0.95 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 6.46 with Chi Square = 19349.78
## df of the model are 12 and the objective function was 0.02
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.01
##
## The harmonic n.obs is 2993 with the empirical chi square 8.11 with prob < 0.78
## The total n.obs was 3000 with Likelihood Chi Square = 54.24 with prob < 2.5e-07
##
## Tucker Lewis Index of factoring reliability = 0.993
## RMSEA index = 0.034 and the 90 % confidence intervals are 0.025 0.044
## BIC = -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA1 PA2 PA3
## Correlation of (regression) scores with factors 0.98 0.89 0.77
## Multiple R square of scores with factors 0.95 0.79 0.60
## Minimum correlation of possible factor scores 0.91 0.58 0.20
<- fa(df, nfactors = 3, rotate = "varimax", fm = "pa") #paf model
paf_result_var3 print(paf_result_var3, cut = 0.3)
## Factor Analysis using method = pa
## Call: fa(r = df, nfactors = 3, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA3 PA2 PA1 h2 u2 com
## hope 0.69 0.55 0.45 1.3
## afraid 0.43 0.70 0.74 0.26 2.0
## outrage 0.74 0.36 0.72 0.28 1.6
## angry 0.80 0.34 0.80 0.20 1.5
## happy 0.80 0.74 0.26 1.3
## worried 0.47 0.68 0.77 0.23 2.2
## proud 0.83 0.76 0.24 1.2
## irritated 0.69 0.34 0.68 0.32 1.8
## nervous 0.40 0.74 0.79 0.21 1.9
##
## PA3 PA2 PA1
## SS loadings 2.35 2.22 1.98
## Proportion Var 0.26 0.25 0.22
## Cumulative Var 0.26 0.51 0.73
## Proportion Explained 0.36 0.34 0.30
## Cumulative Proportion 0.36 0.70 1.00
##
## Mean item complexity = 1.7
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 6.46 with Chi Square = 19349.78
## df of the model are 12 and the objective function was 0.02
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.01
##
## The harmonic n.obs is 2993 with the empirical chi square 8.11 with prob < 0.78
## The total n.obs was 3000 with Likelihood Chi Square = 54.24 with prob < 2.5e-07
##
## Tucker Lewis Index of factoring reliability = 0.993
## RMSEA index = 0.034 and the 90 % confidence intervals are 0.025 0.044
## BIC = -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA3 PA2 PA1
## Correlation of (regression) scores with factors 0.88 0.91 0.86
## Multiple R square of scores with factors 0.77 0.83 0.74
## Minimum correlation of possible factor scores 0.55 0.67 0.47
<- fa(df,nfactors = 3, rotate = "oblimin", fm = "pa") #PAF approach with varimax rotation paf_result_obl3
## Loading required namespace: GPArotation
print (paf_result_obl3, cut = 0.3) #Rotation reveals cleaner factors that are obscured
## Factor Analysis using method = pa
## Call: fa(r = df, nfactors = 3, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA3 PA1 PA2 h2 u2 com
## hope 0.71 0.55 0.45 1.1
## afraid 0.83 0.74 0.26 1.0
## outrage 0.82 0.72 0.28 1.0
## angry 0.92 0.80 0.20 1.0
## happy 0.83 0.74 0.26 1.0
## worried 0.77 0.77 0.23 1.0
## proud 0.89 0.76 0.24 1.0
## irritated 0.74 0.68 0.32 1.0
## nervous 0.92 0.79 0.21 1.0
##
## PA3 PA1 PA2
## SS loadings 2.31 2.20 2.05
## Proportion Var 0.26 0.24 0.23
## Cumulative Var 0.26 0.50 0.73
## Proportion Explained 0.35 0.34 0.31
## Cumulative Proportion 0.35 0.69 1.00
##
## With factor correlations of
## PA3 PA1 PA2
## PA3 1.00 0.82 -0.59
## PA1 0.82 1.00 -0.53
## PA2 -0.59 -0.53 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 6.46 with Chi Square = 19349.78
## df of the model are 12 and the objective function was 0.02
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.01
##
## The harmonic n.obs is 2993 with the empirical chi square 8.11 with prob < 0.78
## The total n.obs was 3000 with Likelihood Chi Square = 54.24 with prob < 2.5e-07
##
## Tucker Lewis Index of factoring reliability = 0.993
## RMSEA index = 0.034 and the 90 % confidence intervals are 0.025 0.044
## BIC = -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA3 PA1 PA2
## Correlation of (regression) scores with factors 0.96 0.95 0.94
## Multiple R square of scores with factors 0.92 0.91 0.88
## Minimum correlation of possible factor scores 0.84 0.82 0.77
fa.diagram(paf_result_var3) #Graphs the relationship
fa.diagram(paf_result_obl3) #Graphs the relationship
Let’s review the same three items in this analysis. Starting with the “SS loadings” we see that the values are much more evenly aligned across the factors in the rotated output with eigenvalues >=1 for all three factors. Because of this, the proportion of total variance explained is also very close across the three factors. This indicates that the rotation was needed and successful in revealing a clearer pattern in the data. There is a strong possibility of three factors based on these results, but we will still compare these results to a 2-factor solution.
Next, by examining the factor loadings for each latent factor, we see that the results follow our theoretical beliefs. Anger, outrage and irritation loaded on Factor 1, hope, happiness, and pride loaded together on Factor 2, and anxiety, worry, and nervousness loaded on Factor 3. When we graph the results we see that clear pattern as well. Generally, the conclusion seems to be that political emotions are structured into three unique latent factors.
Here, we will reestimate our factor analysis but specifying 2 factors only and compare the output to what we just produced above. This will only use the oblique rotation, but the same procedure would apply for orthogonal as well.
<- fa(df,nfactors = 2, rotate = "oblimin", fm = "pa") #PAF approach with varimax rotation
paf_result_obl2 print (paf_result_obl2, cut = 0.3) #Rotation reveals cleaner factors that are obscured
## Factor Analysis using method = pa
## Call: fa(r = df, nfactors = 2, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 h2 u2 com
## hope 0.73 0.55 0.45 1
## afraid 0.80 0.68 0.32 1
## outrage 0.86 0.67 0.33 1
## angry 0.88 0.71 0.29 1
## happy 0.84 0.74 0.26 1
## worried 0.82 0.72 0.28 1
## proud 0.88 0.74 0.26 1
## irritated 0.78 0.63 0.37 1
## nervous 0.78 0.68 0.32 1
##
## PA1 PA2
## SS loadings 4.06 2.07
## Proportion Var 0.45 0.23
## Cumulative Var 0.45 0.68
## Proportion Explained 0.66 0.34
## Cumulative Proportion 0.66 1.00
##
## With factor correlations of
## PA1 PA2
## PA1 1.00 -0.61
## PA2 -0.61 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 2 factors are sufficient.
##
## df null model = 36 with the objective function = 6.46 with Chi Square = 19349.78
## df of the model are 19 and the objective function was 0.39
##
## The root mean square of the residuals (RMSR) is 0.04
## The df corrected root mean square of the residuals is 0.05
##
## The harmonic n.obs is 2993 with the empirical chi square 296.5 with prob < 1e-51
## The total n.obs was 3000 with Likelihood Chi Square = 1155.67 with prob < 2.8e-233
##
## Tucker Lewis Index of factoring reliability = 0.888
## RMSEA index = 0.141 and the 90 % confidence intervals are 0.134 0.148
## BIC = 1003.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA1 PA2
## Correlation of (regression) scores with factors 0.96 0.94
## Multiple R square of scores with factors 0.93 0.88
## Minimum correlation of possible factor scores 0.86 0.77
We see in the results that we have returned two factors only in this approach as it should be. This solution accounts for 68% of the total variance in the multidimensional concept compared to 73% in the three factor solution. This is our first indication that maybe the 3-factor solution is better for this set of data. Next, we look at the factor loadings themselves and see the three negative emotion loading on one factor and the three position emotions loading on the other. This is the expectation and is exactly what we see. The communality and uniqueness values remain the same in both the 2-factor and 3-factor solutions because they are properties of the observed variables, reflecting the proportion of variance explained by all factors combined, rather than any specific factor distribution.
The final aspect we will review is the fit statistics, comparing them to the values observed in the 3-factor solution. We note a TLI value of 0.888 in the 2-factor solution, which is lower than the traditional cut-point of 0.95 and the TLI value of 0.993 from the 3-factor solution. Similarly, the RMSEA value for the 2-factor solution is 0.141, considerably higher than the traditional cut-point of 0.05 and the observed RMSEA value of 0.034 in the 3-factor solution. Taken as a whole, the 3-factor solution better fits the multidimensionality of this dataset.
The following code compares the paf approach used above with the pcf
approach. Note the slight difference in code between the
pcf
and the paf
approaches with the only
difference being principal
for the pcf
approach and fa
for the paf
. We also can
indicate which form of rotation we want to perform for each of the
factor analysis types. Here, we do both no rotation and use varimax, for
the orthogonal rotations, and oblimin, for the oblique rotations. We
start with a basic three factor pcf approach without rotation.
#General pcf code "principal(data frame, nfactors=x, rotate)"
#General paf code "fa(data frame, nfactors=x, rotate)"
#####Principal Components Factor Analysis, 3 factor solution with no rotation, orthogonal (varimax) & oblique (oblimin)
<- principal(df,nfactors = 3, rotate = "none") #PCF approach with no rotation
pcf_result_no3 #Reports same Eigenvalues as reported in Scree Plot pcf_result_no3
## Principal Components Analysis
## Call: principal(r = df, nfactors = 3, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 h2 u2 com
## hope -0.63 0.58 -0.09 0.74 0.26 2.0
## afraid 0.83 0.21 0.34 0.84 0.16 1.5
## outrage 0.80 0.31 -0.29 0.82 0.18 1.6
## angry 0.82 0.31 -0.28 0.84 0.16 1.5
## happy -0.71 0.55 0.08 0.81 0.19 1.9
## worried 0.85 0.19 0.26 0.83 0.17 1.3
## proud -0.68 0.59 0.12 0.82 0.18 2.0
## irritated 0.81 0.22 -0.32 0.80 0.20 1.5
## nervous 0.84 0.17 0.36 0.86 0.14 1.4
##
## PC1 PC2 PC3
## SS loadings 5.44 1.32 0.60
## Proportion Var 0.60 0.15 0.07
## Cumulative Var 0.60 0.75 0.82
## Proportion Explained 0.74 0.18 0.08
## Cumulative Proportion 0.74 0.92 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 490.65 with prob < 2.2e-97
##
## Fit based upon off diagonal values = 0.99
<- principal(df,nfactors = 3, rotate = "varimax") #PCF approach with varimax rotation
pcf_result_var3 #Rotation reveals cleaner factors that are obscured pcf_result_var3
## Principal Components Analysis
## Call: principal(r = df, nfactors = 3, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 RC3 h2 u2 com
## hope -0.08 0.82 -0.27 0.74 0.26 1.2
## afraid 0.37 -0.23 0.81 0.84 0.16 1.6
## outrage 0.81 -0.18 0.36 0.82 0.18 1.5
## angry 0.82 -0.19 0.37 0.84 0.16 1.5
## happy -0.25 0.84 -0.20 0.81 0.19 1.3
## worried 0.43 -0.26 0.76 0.83 0.17 1.9
## proud -0.24 0.86 -0.14 0.82 0.18 1.2
## irritated 0.80 -0.26 0.31 0.80 0.20 1.5
## nervous 0.35 -0.27 0.82 0.86 0.14 1.6
##
## RC1 RC2 RC3
## SS loadings 2.53 2.45 2.38
## Proportion Var 0.28 0.27 0.26
## Cumulative Var 0.28 0.55 0.82
## Proportion Explained 0.34 0.33 0.32
## Cumulative Proportion 0.34 0.68 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 490.65 with prob < 2.2e-97
##
## Fit based upon off diagonal values = 0.99
<- principal(df,nfactors = 3, rotate = "oblimin") #PCF approach with varimax rotation
pcf_result_obl3 #Rotation reveals cleaner factors that are obscured pcf_result_obl3
## Principal Components Analysis
## Call: principal(r = df, nfactors = 3, rotate = "oblimin")
## Standardized loadings (pattern matrix) based upon correlation matrix
## TC3 TC1 TC2 h2 u2 com
## hope -0.18 0.17 0.84 0.74 0.26 1.2
## afraid 0.91 0.02 0.02 0.84 0.16 1.0
## outrage 0.06 0.87 0.02 0.82 0.18 1.0
## angry 0.07 0.87 0.01 0.84 0.16 1.0
## happy 0.00 -0.09 0.86 0.81 0.19 1.0
## worried 0.80 0.13 -0.02 0.83 0.17 1.1
## proud 0.10 -0.11 0.90 0.82 0.18 1.1
## irritated -0.01 0.86 -0.08 0.80 0.20 1.0
## nervous 0.93 -0.02 -0.03 0.86 0.14 1.0
##
## TC3 TC1 TC2
## SS loadings 2.53 2.48 2.34
## Proportion Var 0.28 0.28 0.26
## Cumulative Var 0.28 0.56 0.82
## Proportion Explained 0.34 0.34 0.32
## Cumulative Proportion 0.34 0.68 1.00
##
## With component correlations of
## TC3 TC1 TC2
## TC3 1.00 0.72 -0.51
## TC1 0.72 1.00 -0.45
## TC2 -0.51 -0.45 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 490.65 with prob < 2.2e-97
##
## Fit based upon off diagonal values = 0.99
fa.diagram(pcf_result_obl3)
Results of the PCF approach broadly match the PAF results. This is not surprising as both methods will oftentimes produce very similiar results.
Because we have a priori theory on the appropriate factor structure for these 9 political emotions, we can also use a confirmatory factor analysis to test that three factors does best fit the data.
The ‘lavaan’ package is needed to conduct for this analysis. Because we are testing a priori theory with this approach, we have to specifically which items create which latent factor. First, we will use the three factor solution theory suggests we will find:
####Confirmatory Factor Analysis
# Load required packages
library(lavaan)
# Specify the CFA model; it must include the #
<- '
model # Factor 1
Factor1 =~ outrage + angry + irritated
# Factor 2
Factor2 =~ proud + happy + hope
# Factor 3
Factor3 =~ afraid + nervous + worried
'
# Step 3: Fit the CFA model
<- cfa(model, data = df)
fit
# Step 4: Summarize the results
summary(fit, standardized = TRUE, fit.measures = TRUE) #Gives you summary statistics of the CFA
## lavaan 0.6.15 ended normally after 33 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
##
## Used Total
## Number of observations 2979 3000
##
## Model Test User Model:
##
## Test statistic 172.425
## Degrees of freedom 24
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 19285.634
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.992
## Tucker-Lewis Index (TLI) 0.988
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -32679.903
## Loglikelihood unrestricted model (H1) NA
##
## Akaike (AIC) 65401.807
## Bayesian (BIC) 65527.793
## Sample-size adjusted Bayesian (SABIC) 65461.068
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.046
## 90 Percent confidence interval - lower 0.039
## 90 Percent confidence interval - upper 0.052
## P-value H_0: RMSEA <= 0.050 0.866
## P-value H_0: RMSEA >= 0.080 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.018
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Factor1 =~
## outrage 1.000 1.069 0.853
## angry 1.004 0.016 61.161 0.000 1.073 0.888
## irritated 0.876 0.016 55.002 0.000 0.936 0.826
## Factor2 =~
## proud 1.000 0.990 0.856
## happy 0.939 0.018 53.034 0.000 0.929 0.872
## hope 0.875 0.020 44.844 0.000 0.865 0.743
## Factor3 =~
## afraid 1.000 1.048 0.860
## nervous 0.996 0.016 62.424 0.000 1.043 0.877
## worried 0.967 0.015 63.221 0.000 1.014 0.884
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Factor1 ~~
## Factor2 -0.597 0.026 -22.979 0.000 -0.565 -0.565
## Factor3 0.949 0.032 29.954 0.000 0.847 0.847
## Factor2 ~~
## Factor3 -0.640 0.026 -24.582 0.000 -0.617 -0.617
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .outrage 0.426 0.015 28.140 0.000 0.426 0.272
## .angry 0.309 0.013 24.122 0.000 0.309 0.212
## .irritated 0.407 0.013 30.299 0.000 0.407 0.317
## .proud 0.357 0.016 22.904 0.000 0.357 0.267
## .happy 0.273 0.013 20.897 0.000 0.273 0.240
## .hope 0.608 0.019 32.074 0.000 0.608 0.448
## .afraid 0.387 0.013 28.967 0.000 0.387 0.261
## .nervous 0.325 0.012 27.184 0.000 0.325 0.230
## .worried 0.286 0.011 26.354 0.000 0.286 0.218
## Factor1 1.142 0.040 28.233 0.000 1.000 1.000
## Factor2 0.979 0.036 27.466 0.000 1.000 1.000
## Factor3 1.098 0.038 28.756 0.000 1.000 1.000
semPaths(fit, "std", whatLabels = "est", edge.label.cex = 0.8) #Graphs the CFA factor loadings
There are several things to evaluate in the results for a confirmatory factor analysis. We are more concerned with model fit here than we are in with exploratory factor analysis since we are testing specific hypotheses. We will evaluate the following CFA model fit using the following parameters:
All of the model fit parameters for the three factor solution indicate a good fitting model. This provides initial support that our hypothesized latent emotions do fact exist. However, we should change our model to evaluate if other factor solutions might fit the data better. We will use a two factor solution, combining the six negative emotions and three positive emotions into their own unique factors, and compare those results to the initial results.
First, we create a new model that combines the emotions in the manner previously stated. Then we review the results.
<- '
model2 # Factor 1
Factor1 =~ outrage + angry + irritated +
afraid + nervous + worried
# Factor 2
Factor2 =~ proud + happy + hope
'
# Step 3: Fit the CFA model with varimax rotation
<- cfa(model2, data = df)
fit2
# Step 4: Summarize the results
summary(fit2, standardized = TRUE, fit.measures = TRUE)
## lavaan 0.6.15 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
##
## Used Total
## Number of observations 2979 3000
##
## Model Test User Model:
##
## Test statistic 1198.925
## Degrees of freedom 26
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 19285.634
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.939
## Tucker-Lewis Index (TLI) 0.916
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -33193.153
## Loglikelihood unrestricted model (H1) NA
##
## Akaike (AIC) 66424.307
## Bayesian (BIC) 66538.294
## Sample-size adjusted Bayesian (SABIC) 66477.924
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.123
## 90 Percent confidence interval - lower 0.117
## 90 Percent confidence interval - upper 0.129
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.037
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Factor1 =~
## outrage 1.000 1.001 0.799
## angry 0.989 0.020 50.635 0.000 0.990 0.820
## irritated 0.891 0.019 47.957 0.000 0.892 0.787
## afraid 1.016 0.020 51.891 0.000 1.017 0.835
## nervous 1.000 0.019 52.493 0.000 1.001 0.842
## worried 0.986 0.018 54.189 0.000 0.987 0.861
## Factor2 =~
## proud 1.000 0.990 0.857
## happy 0.938 0.018 53.066 0.000 0.929 0.871
## hope 0.873 0.019 44.805 0.000 0.864 0.742
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Factor1 ~~
## Factor2 -0.619 0.025 -24.360 0.000 -0.624 -0.624
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .outrage 0.567 0.017 33.499 0.000 0.567 0.361
## .angry 0.479 0.015 32.672 0.000 0.479 0.328
## .irritated 0.488 0.014 33.897 0.000 0.488 0.380
## .afraid 0.451 0.014 31.948 0.000 0.451 0.303
## .nervous 0.412 0.013 31.557 0.000 0.412 0.292
## .worried 0.339 0.011 30.257 0.000 0.339 0.258
## .proud 0.356 0.016 22.821 0.000 0.356 0.266
## .happy 0.274 0.013 20.897 0.000 0.274 0.241
## .hope 0.609 0.019 32.113 0.000 0.609 0.449
## Factor1 1.002 0.039 25.805 0.000 1.000 1.000
## Factor2 0.981 0.036 27.494 0.000 1.000 1.000
semPaths(fit2, "std", whatLabels = "est", edge.label.cex = 0.8) #Graphs the CFA factor loadings
Evaluating the same model fit parameters as before, we can immediately see a worse fitting model compared to the prior three factor solution.
Across all of the mode fit parameters, the three factor solution fits the data better than the two factor solution that combined all the negatively valenced emotions. These results match the results from the exploratory factor analysis as well.
In this tutorial, we imported survey data and conducted various types of factor analysis techniques on political emotions in the United States. The results largely follow the prevailing theoretical belief that there are three distinct emotional latent factors: