Factor Analysis Tutorial Using Political Emotions

Factor Analysis: Principal Axis & Principal Component
Confirmatory Factor Analysis (CFA)
Conclusions

## Warning: package 'haven' was built under R version 4.2.3

## Warning: package 'skimr' was built under R version 4.2.3

## Warning: package 'psych' was built under R version 4.2.3

## corrplot 0.92 loaded

## Warning: package 'lavaan' was built under R version 4.2.3

## This is lavaan 0.6-15
## lavaan is FREE software! Please report any bugs.

## 
## Attaching package: 'lavaan'

## The following object is masked from 'package:psych':
## 
##     cor2cov

## Warning: package 'semPlot' was built under R version 4.2.3

In this tutorial, we will conduct both exploratory and confirmatory factor analysis using the nationally representative 2020 American National Election Survey for our data. In this survey, 9 unique political emotions variables were asked of respondents, and we will use these 9 variables to understand how political emotions are structured. Currently, it is widely believed that three latent emotional factors best explain the structure of political emotions (Marcus et al 2006) :

Aversion to Politics: Measures include anger, outrage, and irritation
Worry about Politics: Measures include fear, worry, and nervousness
Enthusiasm about Politics: Measures include happiness, hope, and pride

We will use factor analysis to test whether a three factor solution really is the best way to explain these political emotions.

df <- anes %>% 
 dplyr::select(V201115, V201116, V201117, V201118, V201119, V201120,
                 V201121, V201122, V201123  ) #Save only the variables you want to include in your analysis
df[df <= -1] <- NA

new_names <- c("hope", "afraid", "outrage", "angry", "happy", "worried", "proud", "irritated", "nervous") #Give your variables new informative names 

# Update column names
colnames(df) <- new_names #Apply new names to your data frame

skim(df) #Checks the variables in your data frame; evaluate for missing data

Data summary
Name	df
Number of rows	3000
Number of columns	9
_______________________
Column type frequency:
numeric	9
________________________
Group variables	None

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
hope	6	1	2.52	1.17	1	2	3	3	5	▆▇▇▃▂
afraid	4	1	3.38	1.22	1	3	3	4	5	▂▅▇▇▆
outrage	3	1	3.58	1.25	1	3	4	5	5	▂▃▆▇▇
angry	2	1	3.59	1.21	1	3	4	5	5	▂▃▇▇▇
happy	7	1	1.96	1.07	1	1	2	3	5	▇▃▃▁▁
worried	2	1	3.68	1.14	1	3	4	5	5	▁▃▆▇▇
proud	9	1	2.02	1.16	1	1	2	3	5	▇▃▃▂▁
irritated	2	1	3.84	1.13	1	3	4	5	5	▁▂▅▇▇
nervous	1	1	3.53	1.19	1	3	4	5	5	▂▃▇▇▇

To start, we read in our data, in this case the 2020 American National Election study. Then we save a new data frame that includes only the nine political emotions variables we want to include in our analysis. We will use this data frame throughout the code. Once we have saved the political emotions variable as a new data frame, we recode all the negative values to NA as these are non-substantive responses which should be removed (as indicated in the associated codebook).

Next, we change the variables names to something that is more informative. This will aid in interpretation of the factor analysis results and should not be skipped. Lastly, we use the skimr package to quickly skim the variables in our dataset. We want to see the minimum value = 1 with no negative values, since negative values should be treated as missing data.

Step 1: Evaluating Correlations Between Political Emotions

First step is to check the correlations between your variables, political emotions here, to see how related, or not, each of the individual items are. Below, we create a matrix with all the correlations between the individual items and graph the correlations using a heat-map for easier viewing.

#Step 1: Evaluate correlations 
cor_matrix<-cor(df, use = "pairwise.complete.obs") #Saves correlation matrix
corrplot(cor_matrix, method = "circle", type="lower", diag = FALSE) # Plot correlation matrix as a heatmap

cor_matrix <- round(cor_matrix, 3)
cor_matrix

##             hope afraid outrage  angry  happy worried  proud irritated nervous
## hope       1.000 -0.420  -0.335 -0.357  0.638  -0.420  0.642    -0.370  -0.436
## afraid    -0.420  1.000   0.627  0.648 -0.453   0.751 -0.416     0.604   0.763
## outrage   -0.335  0.627   1.000  0.766 -0.417   0.650 -0.394     0.691   0.622
## angry     -0.357  0.648   0.766  1.000 -0.423   0.656 -0.400     0.733   0.637
## happy      0.638 -0.453  -0.417 -0.423  1.000  -0.493  0.748    -0.459  -0.479
## worried   -0.420  0.751   0.650  0.656 -0.493   1.000 -0.448     0.658   0.776
## proud      0.642 -0.416  -0.394 -0.400  0.748  -0.448  1.000    -0.438  -0.445
## irritated -0.370  0.604   0.691  0.733 -0.459   0.658 -0.438     1.000   0.603
## nervous   -0.436  0.763   0.622  0.637 -0.479   0.776 -0.445     0.603   1.000

###At least 2 distinct and possibly 3 distinct factors from examining

Results indicate at least two and probably three distinct factors in the political emotions’ variables. Three factors would match the dominant belief in the literature on how political emotions are structured (Marcus et al 2006). The three positive emotions are clearly positively related to each other with smaller and negative coefficients to the other six emotions. For the negative emotions, all six items are positively and significantly related to one another indicating they might all be measuring the same concept. However, closer examination the results indicates that being afraid is more highly correlated with being worried or nervous than it is anger, outrage, or irritation. That matches the underlying theoretical belief as well that those three items represent an “anxiety about politics” factor whereas anger, outrage, and irritation represent an “aversion” factor towards politics.

Conducting a factor analysis will help us better understand how these 9 individual political emotions are related to one another. We will conduct both exploratory and confirmatory factor analysis to illustrate both methods.

Factor Analysis: Principal Axis & Principal Component

We’ll start the factor analysis with an exploratory factor analysis approach using principal axis factor analysis(paf) before exploring the use of principal component factor analysis (pcf) . Eventually, we’ll compare the results between these two approaches to evaluate differences. PCF approaches are more generally used for data reduction reasons whereas PAF approaches are used to evaluate underlying latent concepts in the data.

Step 2: Screeplot

#Step 2: Evaluate Screeplot - looking for number of factors >= ~1 
scree(df) #from 'psych' package and graphs scree plot for PCF and PAF approaches

The scree plot shows eigenvalues from a PCF and a PAF, non-rotated, factor analysis. The PCF shows two clear factors with a third worth looking into whereas the PAF shows one clear factor with a second factor that is close. Knowing what we know from the correlations we examined, we will start with a three factor solution with our exploratory analysis. If two factors does indeed fit the data better than three, the factor analysis will show that.

Step 3: Estimate the Factor Analysis

First, we will estimate a series of factor analyses to illustrate PCF with different rotation types: None, Orthogonal, & Oblique starting with no rotation.

paf_result_no <- fa(df, nfactors = 3, rotate = "none", fm="pa") #paf model
paf_result_no #Reports same Eigenvalues as reported in Scree Plot

## Factor Analysis using method =  pa
## Call: fa(r = df, nfactors = 3, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             PA1  PA2   PA3   h2   u2 com
## hope      -0.60 0.44 -0.04 0.55 0.45 1.9
## afraid     0.81 0.18  0.24 0.74 0.26 1.3
## outrage    0.78 0.26 -0.22 0.72 0.28 1.4
## angry      0.81 0.28 -0.27 0.80 0.20 1.5
## happy     -0.69 0.51  0.04 0.74 0.26 1.8
## worried    0.84 0.17  0.20 0.77 0.23 1.2
## proud     -0.67 0.55  0.08 0.76 0.24 2.0
## irritated  0.78 0.18 -0.20 0.68 0.32 1.2
## nervous    0.83 0.15  0.29 0.79 0.21 1.3
## 
##                        PA1  PA2  PA3
## SS loadings           5.18 1.03 0.35
## Proportion Var        0.58 0.11 0.04
## Cumulative Var        0.58 0.69 0.73
## Proportion Explained  0.79 0.16 0.05
## Cumulative Proportion 0.79 0.95 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  6.46 with Chi Square =  19349.78
## df of  the model are 12  and the objective function was  0.02 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.01 
## 
## The harmonic n.obs is  2993 with the empirical chi square  8.11  with prob <  0.78 
## The total n.obs was  3000  with Likelihood Chi Square =  54.24  with prob <  2.5e-07 
## 
## Tucker Lewis Index of factoring reliability =  0.993
## RMSEA index =  0.034  and the 90 % confidence intervals are  0.025 0.044
## BIC =  -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA2  PA3
## Correlation of (regression) scores with factors   0.98 0.89 0.77
## Multiple R square of scores with factors          0.95 0.79 0.60
## Minimum correlation of possible factor scores     0.91 0.58 0.20

Let’s evaluate the results. The first thing to review is the ‘SS loadings’ row of results. The three values shown in that row are the eigenvalues for the 3 unique factors we specified. The first two factors both have eigenvalues >1 while the third factor’s eigenvalue is below 1. We also want to evaluate the proportion of the variance that each factor explains. Factor 1 clearly explains the most (~58%) while the third factor only adds 4% of additional explained variance.

Next, review the actual factors and see which measures load on which factor (this is why we want to give the variables a name that is intuitive). We see that the six negative emotions all seem to load on Factor 1 while the three positive emotions seem to load on Factor 2. The third factor seems to loosely be related to political anxiety and includes being afraid, worried, and nervous. While Factor 3 is not clearly unique in the unrotated factor analysis, the fact that there are reasonably strong factor loadings indicates that rotation may help to reveal a clearer pattern in the results.

The final items to review are fit statistics. Specifically, we want to examine the Tucker-Lewis Index and the Root Mean Square Error of Approximation (RMSEA). In the TLI analysis, a “good” fitting model will have a result of >=.95. Here, we see a TLI value of .993, which is higher than the traditional cut-point of .95. We will also use these values to asses performance of the 3-factor solution against the 2-factor solution shortly. For the RMSEA result, the traditional cut-point of what is a well fitting factor analysis is less than .05 and here we see a value of .034, which is smaller than the .05 cut-point. Right now, the 3-factor solution looks good, but we still need to do more investigation.

Finally, we can also graph the factor results and note that in the unrotated results all nine emotions load most strongly on Factor 1 even though the positive emotions and negatively related to the negative emotions. This graph takes the absolute value of the factor loadings and matches the highest factor loading for that item to the appropriate latent factor.

fa.diagram(paf_result_no) #Graphs the relationship

Knowing what we saw in the correlation matrix in addition to the strong factor loadings from the unrotated paf model, we should go ahead and rotate our factor analysis results. Here we will use an orthogonal rotation, varimax, which removes all shared variance between the latent factors. It’s common practice to suppress items with small factor loadings (<.3), which we do here by adding cut=.3 to the results display. For instance print(paf_result_var3, cut = 0.3) will suppress the factor loadings from the PAF model using the varimax orthogonal rotation results.

#####Principal Axis Factor Analysis, 3 factor solution with no rotation, orthogonal (varimax) & oblique (oblimin)
paf_result_no3 <- fa(df, nfactors = 3, rotate = "none", fm="pa") #paf model
paf_result_no3 #Reports same Eigenvalues as reported in Scree Plot

## Factor Analysis using method =  pa
## Call: fa(r = df, nfactors = 3, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             PA1  PA2   PA3   h2   u2 com
## hope      -0.60 0.44 -0.04 0.55 0.45 1.9
## afraid     0.81 0.18  0.24 0.74 0.26 1.3
## outrage    0.78 0.26 -0.22 0.72 0.28 1.4
## angry      0.81 0.28 -0.27 0.80 0.20 1.5
## happy     -0.69 0.51  0.04 0.74 0.26 1.8
## worried    0.84 0.17  0.20 0.77 0.23 1.2
## proud     -0.67 0.55  0.08 0.76 0.24 2.0
## irritated  0.78 0.18 -0.20 0.68 0.32 1.2
## nervous    0.83 0.15  0.29 0.79 0.21 1.3
## 
##                        PA1  PA2  PA3
## SS loadings           5.18 1.03 0.35
## Proportion Var        0.58 0.11 0.04
## Cumulative Var        0.58 0.69 0.73
## Proportion Explained  0.79 0.16 0.05
## Cumulative Proportion 0.79 0.95 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  6.46 with Chi Square =  19349.78
## df of  the model are 12  and the objective function was  0.02 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.01 
## 
## The harmonic n.obs is  2993 with the empirical chi square  8.11  with prob <  0.78 
## The total n.obs was  3000  with Likelihood Chi Square =  54.24  with prob <  2.5e-07 
## 
## Tucker Lewis Index of factoring reliability =  0.993
## RMSEA index =  0.034  and the 90 % confidence intervals are  0.025 0.044
## BIC =  -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA2  PA3
## Correlation of (regression) scores with factors   0.98 0.89 0.77
## Multiple R square of scores with factors          0.95 0.79 0.60
## Minimum correlation of possible factor scores     0.91 0.58 0.20

paf_result_var3 <- fa(df, nfactors = 3, rotate = "varimax",  fm = "pa") #paf model
print(paf_result_var3, cut = 0.3)

## Factor Analysis using method =  pa
## Call: fa(r = df, nfactors = 3, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             PA3   PA2   PA1   h2   u2 com
## hope             0.69       0.55 0.45 1.3
## afraid     0.43        0.70 0.74 0.26 2.0
## outrage    0.74        0.36 0.72 0.28 1.6
## angry      0.80        0.34 0.80 0.20 1.5
## happy            0.80       0.74 0.26 1.3
## worried    0.47        0.68 0.77 0.23 2.2
## proud            0.83       0.76 0.24 1.2
## irritated  0.69        0.34 0.68 0.32 1.8
## nervous    0.40        0.74 0.79 0.21 1.9
## 
##                        PA3  PA2  PA1
## SS loadings           2.35 2.22 1.98
## Proportion Var        0.26 0.25 0.22
## Cumulative Var        0.26 0.51 0.73
## Proportion Explained  0.36 0.34 0.30
## Cumulative Proportion 0.36 0.70 1.00
## 
## Mean item complexity =  1.7
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  6.46 with Chi Square =  19349.78
## df of  the model are 12  and the objective function was  0.02 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.01 
## 
## The harmonic n.obs is  2993 with the empirical chi square  8.11  with prob <  0.78 
## The total n.obs was  3000  with Likelihood Chi Square =  54.24  with prob <  2.5e-07 
## 
## Tucker Lewis Index of factoring reliability =  0.993
## RMSEA index =  0.034  and the 90 % confidence intervals are  0.025 0.044
## BIC =  -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA3  PA2  PA1
## Correlation of (regression) scores with factors   0.88 0.91 0.86
## Multiple R square of scores with factors          0.77 0.83 0.74
## Minimum correlation of possible factor scores     0.55 0.67 0.47

paf_result_obl3 <- fa(df,nfactors = 3,  rotate = "oblimin",  fm = "pa") #PAF approach with varimax rotation

## Loading required namespace: GPArotation

print (paf_result_obl3, cut = 0.3) #Rotation reveals cleaner factors that are obscured

## Factor Analysis using method =  pa
## Call: fa(r = df, nfactors = 3, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             PA3   PA1   PA2   h2   u2 com
## hope                   0.71 0.55 0.45 1.1
## afraid     0.83             0.74 0.26 1.0
## outrage          0.82       0.72 0.28 1.0
## angry            0.92       0.80 0.20 1.0
## happy                  0.83 0.74 0.26 1.0
## worried    0.77             0.77 0.23 1.0
## proud                  0.89 0.76 0.24 1.0
## irritated        0.74       0.68 0.32 1.0
## nervous    0.92             0.79 0.21 1.0
## 
##                        PA3  PA1  PA2
## SS loadings           2.31 2.20 2.05
## Proportion Var        0.26 0.24 0.23
## Cumulative Var        0.26 0.50 0.73
## Proportion Explained  0.35 0.34 0.31
## Cumulative Proportion 0.35 0.69 1.00
## 
##  With factor correlations of 
##       PA3   PA1   PA2
## PA3  1.00  0.82 -0.59
## PA1  0.82  1.00 -0.53
## PA2 -0.59 -0.53  1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  36  with the objective function =  6.46 with Chi Square =  19349.78
## df of  the model are 12  and the objective function was  0.02 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.01 
## 
## The harmonic n.obs is  2993 with the empirical chi square  8.11  with prob <  0.78 
## The total n.obs was  3000  with Likelihood Chi Square =  54.24  with prob <  2.5e-07 
## 
## Tucker Lewis Index of factoring reliability =  0.993
## RMSEA index =  0.034  and the 90 % confidence intervals are  0.025 0.044
## BIC =  -41.83
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA3  PA1  PA2
## Correlation of (regression) scores with factors   0.96 0.95 0.94
## Multiple R square of scores with factors          0.92 0.91 0.88
## Minimum correlation of possible factor scores     0.84 0.82 0.77

fa.diagram(paf_result_var3) #Graphs the relationship

fa.diagram(paf_result_obl3) #Graphs the relationship

Let’s review the same three items in this analysis. Starting with the “SS loadings” we see that the values are much more evenly aligned across the factors in the rotated output with eigenvalues >=1 for all three factors. Because of this, the proportion of total variance explained is also very close across the three factors. This indicates that the rotation was needed and successful in revealing a clearer pattern in the data. There is a strong possibility of three factors based on these results, but we will still compare these results to a 2-factor solution.

Next, by examining the factor loadings for each latent factor, we see that the results follow our theoretical beliefs. Anger, outrage and irritation loaded on Factor 1, hope, happiness, and pride loaded together on Factor 2, and anxiety, worry, and nervousness loaded on Factor 3. When we graph the results we see that clear pattern as well. Generally, the conclusion seems to be that political emotions are structured into three unique latent factors.

Comparing 2 versus 3 Factor Solutions - Oblique rotation

Here, we will reestimate our factor analysis but specifying 2 factors only and compare the output to what we just produced above. This will only use the oblique rotation, but the same procedure would apply for orthogonal as well.

paf_result_obl2 <- fa(df,nfactors = 2,  rotate = "oblimin",  fm = "pa") #PAF approach with varimax rotation
print (paf_result_obl2, cut = 0.3) #Rotation reveals cleaner factors that are obscured

## Factor Analysis using method =  pa
## Call: fa(r = df, nfactors = 2, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             PA1   PA2   h2   u2 com
## hope             0.73 0.55 0.45   1
## afraid     0.80       0.68 0.32   1
## outrage    0.86       0.67 0.33   1
## angry      0.88       0.71 0.29   1
## happy            0.84 0.74 0.26   1
## worried    0.82       0.72 0.28   1
## proud            0.88 0.74 0.26   1
## irritated  0.78       0.63 0.37   1
## nervous    0.78       0.68 0.32   1
## 
##                        PA1  PA2
## SS loadings           4.06 2.07
## Proportion Var        0.45 0.23
## Cumulative Var        0.45 0.68
## Proportion Explained  0.66 0.34
## Cumulative Proportion 0.66 1.00
## 
##  With factor correlations of 
##       PA1   PA2
## PA1  1.00 -0.61
## PA2 -0.61  1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  36  with the objective function =  6.46 with Chi Square =  19349.78
## df of  the model are 19  and the objective function was  0.39 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.05 
## 
## The harmonic n.obs is  2993 with the empirical chi square  296.5  with prob <  1e-51 
## The total n.obs was  3000  with Likelihood Chi Square =  1155.67  with prob <  2.8e-233 
## 
## Tucker Lewis Index of factoring reliability =  0.888
## RMSEA index =  0.141  and the 90 % confidence intervals are  0.134 0.148
## BIC =  1003.55
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA2
## Correlation of (regression) scores with factors   0.96 0.94
## Multiple R square of scores with factors          0.93 0.88
## Minimum correlation of possible factor scores     0.86 0.77

We see in the results that we have returned two factors only in this approach as it should be. This solution accounts for 68% of the total variance in the multidimensional concept compared to 73% in the three factor solution. This is our first indication that maybe the 3-factor solution is better for this set of data. Next, we look at the factor loadings themselves and see the three negative emotion loading on one factor and the three position emotions loading on the other. This is the expectation and is exactly what we see. The communality and uniqueness values remain the same in both the 2-factor and 3-factor solutions because they are properties of the observed variables, reflecting the proportion of variance explained by all factors combined, rather than any specific factor distribution.

The final aspect we will review is the fit statistics, comparing them to the values observed in the 3-factor solution. We note a TLI value of 0.888 in the 2-factor solution, which is lower than the traditional cut-point of 0.95 and the TLI value of 0.993 from the 3-factor solution. Similarly, the RMSEA value for the 2-factor solution is 0.141, considerably higher than the traditional cut-point of 0.05 and the observed RMSEA value of 0.034 in the 3-factor solution. Taken as a whole, the 3-factor solution better fits the multidimensionality of this dataset.

Comparing PCF and PAF Approaches

The following code compares the paf approach used above with the pcf approach. Note the slight difference in code between the pcf and the paf approaches with the only difference being principal for the pcf approach and fa for the paf. We also can indicate which form of rotation we want to perform for each of the factor analysis types. Here, we do both no rotation and use varimax, for the orthogonal rotations, and oblimin, for the oblique rotations. We start with a basic three factor pcf approach without rotation.

#General pcf code "principal(data frame, nfactors=x, rotate)"
#General paf code "fa(data frame, nfactors=x, rotate)"

#####Principal Components Factor Analysis, 3 factor solution with no rotation, orthogonal (varimax) & oblique (oblimin)
pcf_result_no3 <- principal(df,nfactors = 3,  rotate = "none") #PCF approach with no rotation
pcf_result_no3 #Reports same Eigenvalues as reported in Scree Plot

## Principal Components Analysis
## Call: principal(r = df, nfactors = 3, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             PC1  PC2   PC3   h2   u2 com
## hope      -0.63 0.58 -0.09 0.74 0.26 2.0
## afraid     0.83 0.21  0.34 0.84 0.16 1.5
## outrage    0.80 0.31 -0.29 0.82 0.18 1.6
## angry      0.82 0.31 -0.28 0.84 0.16 1.5
## happy     -0.71 0.55  0.08 0.81 0.19 1.9
## worried    0.85 0.19  0.26 0.83 0.17 1.3
## proud     -0.68 0.59  0.12 0.82 0.18 2.0
## irritated  0.81 0.22 -0.32 0.80 0.20 1.5
## nervous    0.84 0.17  0.36 0.86 0.14 1.4
## 
##                        PC1  PC2  PC3
## SS loadings           5.44 1.32 0.60
## Proportion Var        0.60 0.15 0.07
## Cumulative Var        0.60 0.75 0.82
## Proportion Explained  0.74 0.18 0.08
## Cumulative Proportion 0.74 0.92 1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 3 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.05 
##  with the empirical chi square  490.65  with prob <  2.2e-97 
## 
## Fit based upon off diagonal values = 0.99

pcf_result_var3 <- principal(df,nfactors = 3,  rotate = "varimax") #PCF approach with varimax rotation
pcf_result_var3 #Rotation reveals cleaner factors that are obscured

## Principal Components Analysis
## Call: principal(r = df, nfactors = 3, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             RC1   RC2   RC3   h2   u2 com
## hope      -0.08  0.82 -0.27 0.74 0.26 1.2
## afraid     0.37 -0.23  0.81 0.84 0.16 1.6
## outrage    0.81 -0.18  0.36 0.82 0.18 1.5
## angry      0.82 -0.19  0.37 0.84 0.16 1.5
## happy     -0.25  0.84 -0.20 0.81 0.19 1.3
## worried    0.43 -0.26  0.76 0.83 0.17 1.9
## proud     -0.24  0.86 -0.14 0.82 0.18 1.2
## irritated  0.80 -0.26  0.31 0.80 0.20 1.5
## nervous    0.35 -0.27  0.82 0.86 0.14 1.6
## 
##                        RC1  RC2  RC3
## SS loadings           2.53 2.45 2.38
## Proportion Var        0.28 0.27 0.26
## Cumulative Var        0.28 0.55 0.82
## Proportion Explained  0.34 0.33 0.32
## Cumulative Proportion 0.34 0.68 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 3 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.05 
##  with the empirical chi square  490.65  with prob <  2.2e-97 
## 
## Fit based upon off diagonal values = 0.99

pcf_result_obl3 <- principal(df,nfactors = 3,  rotate = "oblimin") #PCF approach with varimax rotation
pcf_result_obl3 #Rotation reveals cleaner factors that are obscured

## Principal Components Analysis
## Call: principal(r = df, nfactors = 3, rotate = "oblimin")
## Standardized loadings (pattern matrix) based upon correlation matrix
##             TC3   TC1   TC2   h2   u2 com
## hope      -0.18  0.17  0.84 0.74 0.26 1.2
## afraid     0.91  0.02  0.02 0.84 0.16 1.0
## outrage    0.06  0.87  0.02 0.82 0.18 1.0
## angry      0.07  0.87  0.01 0.84 0.16 1.0
## happy      0.00 -0.09  0.86 0.81 0.19 1.0
## worried    0.80  0.13 -0.02 0.83 0.17 1.1
## proud      0.10 -0.11  0.90 0.82 0.18 1.1
## irritated -0.01  0.86 -0.08 0.80 0.20 1.0
## nervous    0.93 -0.02 -0.03 0.86 0.14 1.0
## 
##                        TC3  TC1  TC2
## SS loadings           2.53 2.48 2.34
## Proportion Var        0.28 0.28 0.26
## Cumulative Var        0.28 0.56 0.82
## Proportion Explained  0.34 0.34 0.32
## Cumulative Proportion 0.34 0.68 1.00
## 
##  With component correlations of 
##       TC3   TC1   TC2
## TC3  1.00  0.72 -0.51
## TC1  0.72  1.00 -0.45
## TC2 -0.51 -0.45  1.00
## 
## Mean item complexity =  1
## Test of the hypothesis that 3 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.05 
##  with the empirical chi square  490.65  with prob <  2.2e-97 
## 
## Fit based upon off diagonal values = 0.99

fa.diagram(pcf_result_obl3)

Results of the PCF approach broadly match the PAF results. This is not surprising as both methods will oftentimes produce very similiar results.

Confirmatory Factor Analysis (CFA)

Because we have a priori theory on the appropriate factor structure for these 9 political emotions, we can also use a confirmatory factor analysis to test that three factors does best fit the data.

The ‘lavaan’ package is needed to conduct for this analysis. Because we are testing a priori theory with this approach, we have to specifically which items create which latent factor. First, we will use the three factor solution theory suggests we will find:

Factor 1 = Outrage, anger, and irritation
Factor 2 = Pride, happiness, and hope
Factor 3 = Being afraid, worried, nervousness.

####Confirmatory Factor Analysis
# Load required packages
library(lavaan)

# Specify the CFA model; it must include the #
model <- '
   # Factor 1
   Factor1 =~ outrage + angry  + irritated
   
   # Factor 2
   Factor2 =~ proud + happy  + hope
   
   # Factor 3
   Factor3 =~ afraid  + nervous + worried
'

# Step 3: Fit the CFA model
fit <- cfa(model, data = df)

# Step 4: Summarize the results
summary(fit, standardized = TRUE, fit.measures = TRUE)  #Gives you summary statistics of the CFA

## lavaan 0.6.15 ended normally after 33 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        21
## 
##                                                   Used       Total
##   Number of observations                          2979        3000
## 
## Model Test User Model:
##                                                       
##   Test statistic                               172.425
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                             19285.634
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.992
##   Tucker-Lewis Index (TLI)                       0.988
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -32679.903
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                               65401.807
##   Bayesian (BIC)                             65527.793
##   Sample-size adjusted Bayesian (SABIC)      65461.068
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.046
##   90 Percent confidence interval - lower         0.039
##   90 Percent confidence interval - upper         0.052
##   P-value H_0: RMSEA <= 0.050                    0.866
##   P-value H_0: RMSEA >= 0.080                    0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.018
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Factor1 =~                                                            
##     outrage           1.000                               1.069    0.853
##     angry             1.004    0.016   61.161    0.000    1.073    0.888
##     irritated         0.876    0.016   55.002    0.000    0.936    0.826
##   Factor2 =~                                                            
##     proud             1.000                               0.990    0.856
##     happy             0.939    0.018   53.034    0.000    0.929    0.872
##     hope              0.875    0.020   44.844    0.000    0.865    0.743
##   Factor3 =~                                                            
##     afraid            1.000                               1.048    0.860
##     nervous           0.996    0.016   62.424    0.000    1.043    0.877
##     worried           0.967    0.015   63.221    0.000    1.014    0.884
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Factor1 ~~                                                            
##     Factor2          -0.597    0.026  -22.979    0.000   -0.565   -0.565
##     Factor3           0.949    0.032   29.954    0.000    0.847    0.847
##   Factor2 ~~                                                            
##     Factor3          -0.640    0.026  -24.582    0.000   -0.617   -0.617
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .outrage           0.426    0.015   28.140    0.000    0.426    0.272
##    .angry             0.309    0.013   24.122    0.000    0.309    0.212
##    .irritated         0.407    0.013   30.299    0.000    0.407    0.317
##    .proud             0.357    0.016   22.904    0.000    0.357    0.267
##    .happy             0.273    0.013   20.897    0.000    0.273    0.240
##    .hope              0.608    0.019   32.074    0.000    0.608    0.448
##    .afraid            0.387    0.013   28.967    0.000    0.387    0.261
##    .nervous           0.325    0.012   27.184    0.000    0.325    0.230
##    .worried           0.286    0.011   26.354    0.000    0.286    0.218
##     Factor1           1.142    0.040   28.233    0.000    1.000    1.000
##     Factor2           0.979    0.036   27.466    0.000    1.000    1.000
##     Factor3           1.098    0.038   28.756    0.000    1.000    1.000

semPaths(fit, "std", whatLabels = "est", edge.label.cex = 0.8) #Graphs the CFA factor loadings

There are several things to evaluate in the results for a confirmatory factor analysis. We are more concerned with model fit here than we are in with exploratory factor analysis since we are testing specific hypotheses. We will evaluate the following CFA model fit using the following parameters:

Root Mean Square Error (RMSEA) where lower values = better fitting model.
- RMSEA <=.06 considered good fit
- Model RMSEA = .044 which is below the, arbitrary, cut point of .06 to indicate good fitting model
Standardized Root Mean Square Residual (SRMR) which is standardized version of RMSEA.
- SRMR<=.1 considered good fit
- Model SRMR = .018 which is below the, arbitrary, cut point of .1 to indicate good fitting model
Comparative Fit Index reflects improvement in model fit compared to a null model.
- Closer to 1 indicates better fitting model with >=.9 considered good fit
- Model CFI = .991 nearly 1 so indicates very good fitting model
Tucker Lewis Index similar to CFI and reflects improvement in model fit compared to a null model.
- Closer to 1 indicates better fitting model with >=.9 considered good fit
- Model TLI = .989 also higher than .9
AIC and BIC are useful for comparing across nested models so we can compare these values to a two factor solution to see which best fits the data
- AIC =179527;
- BIC=179674
With large sample sizes like we have here, the chi-square calculation will almost always be significant so does not provide valuable information here.

All of the model fit parameters for the three factor solution indicate a good fitting model. This provides initial support that our hypothesized latent emotions do fact exist. However, we should change our model to evaluate if other factor solutions might fit the data better. We will use a two factor solution, combining the six negative emotions and three positive emotions into their own unique factors, and compare those results to the initial results.

First, we create a new model that combines the emotions in the manner previously stated. Then we review the results.

model2 <- '
   # Factor 1
   Factor1 =~ outrage + angry  + irritated + 
   afraid  + nervous + worried 
   
   # Factor 2
   Factor2 =~ proud + happy  + hope
   

'

# Step 3: Fit the CFA model with varimax rotation
fit2 <- cfa(model2, data = df)

# Step 4: Summarize the results
summary(fit2, standardized = TRUE, fit.measures = TRUE)

## lavaan 0.6.15 ended normally after 25 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        19
## 
##                                                   Used       Total
##   Number of observations                          2979        3000
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1198.925
##   Degrees of freedom                                26
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                             19285.634
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.939
##   Tucker-Lewis Index (TLI)                       0.916
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -33193.153
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                               66424.307
##   Bayesian (BIC)                             66538.294
##   Sample-size adjusted Bayesian (SABIC)      66477.924
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.123
##   90 Percent confidence interval - lower         0.117
##   90 Percent confidence interval - upper         0.129
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.037
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Factor1 =~                                                            
##     outrage           1.000                               1.001    0.799
##     angry             0.989    0.020   50.635    0.000    0.990    0.820
##     irritated         0.891    0.019   47.957    0.000    0.892    0.787
##     afraid            1.016    0.020   51.891    0.000    1.017    0.835
##     nervous           1.000    0.019   52.493    0.000    1.001    0.842
##     worried           0.986    0.018   54.189    0.000    0.987    0.861
##   Factor2 =~                                                            
##     proud             1.000                               0.990    0.857
##     happy             0.938    0.018   53.066    0.000    0.929    0.871
##     hope              0.873    0.019   44.805    0.000    0.864    0.742
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Factor1 ~~                                                            
##     Factor2          -0.619    0.025  -24.360    0.000   -0.624   -0.624
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .outrage           0.567    0.017   33.499    0.000    0.567    0.361
##    .angry             0.479    0.015   32.672    0.000    0.479    0.328
##    .irritated         0.488    0.014   33.897    0.000    0.488    0.380
##    .afraid            0.451    0.014   31.948    0.000    0.451    0.303
##    .nervous           0.412    0.013   31.557    0.000    0.412    0.292
##    .worried           0.339    0.011   30.257    0.000    0.339    0.258
##    .proud             0.356    0.016   22.821    0.000    0.356    0.266
##    .happy             0.274    0.013   20.897    0.000    0.274    0.241
##    .hope              0.609    0.019   32.113    0.000    0.609    0.449
##     Factor1           1.002    0.039   25.805    0.000    1.000    1.000
##     Factor2           0.981    0.036   27.494    0.000    1.000    1.000

semPaths(fit2, "std", whatLabels = "est", edge.label.cex = 0.8) #Graphs the CFA factor loadings

Evaluating the same model fit parameters as before, we can immediately see a worse fitting model compared to the prior three factor solution.

RMSEA = .044 with three factor solution vs. .13 in two factor; closer to 0 the better -
SRMR = .018 in three factor vs. .038 in 2 factor; closer to 0 the better -
CFI = .991 in three factor vs. .933 in two factor, closer to 1 indicates better fitting model
TLI = .989 in three factor vs. .907 in two factor, closer to 1 indicates better fitting model
AIC =179527 in three factor vs. 182734 in two factor
BIC =179674 in three factor vs. 182867 in two factor

Across all of the mode fit parameters, the three factor solution fits the data better than the two factor solution that combined all the negatively valenced emotions. These results match the results from the exploratory factor analysis as well.

Conclusions

In this tutorial, we imported survey data and conducted various types of factor analysis techniques on political emotions in the United States. The results largely follow the prevailing theoretical belief that there are three distinct emotional latent factors:

Aversion to Politics: Measures anger, outrage, and irritation
Worry about Politics: Measures fear, worry, and nervousness
Enthusiasm about Politics: Measures happiness, hope, and pride