#How I Perform Factor Analysis in R Download this code from
How I Perform Factor Analysis in R Download
# Load the Dataset First, we load the attitude dataset that comes built-in with R:
The attitude dataset contains 30 observations on 7 variables related to employee attitudes.
We can view the first few rows using the head() function:
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
Next, we’ll explore the dataset using the skimr and summary() functions:
Name | attitude |
Number of rows | 30 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
numeric | 7 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
rating | 0 | 1 | 64.63 | 12.17 | 40 | 58.75 | 65.5 | 71.75 | 85 | ▃▃▇▅▅ |
complaints | 0 | 1 | 66.60 | 13.31 | 37 | 58.50 | 65.0 | 77.00 | 90 | ▂▅▇▆▅ |
privileges | 0 | 1 | 53.13 | 12.24 | 30 | 45.00 | 51.5 | 62.50 | 83 | ▂▇▅▅▁ |
learning | 0 | 1 | 56.37 | 11.74 | 34 | 47.00 | 56.5 | 66.75 | 75 | ▃▇▇▃▇ |
raises | 0 | 1 | 64.63 | 10.40 | 43 | 58.25 | 63.5 | 71.00 | 88 | ▂▇▇▆▂ |
critical | 0 | 1 | 74.77 | 9.89 | 49 | 69.25 | 77.5 | 80.00 | 92 | ▂▂▂▇▂ |
advance | 0 | 1 | 42.93 | 10.29 | 25 | 35.00 | 41.0 | 47.75 | 72 | ▅▇▆▂▂ |
## rating complaints privileges learning raises
## Min. :40.00 Min. :37.0 Min. :30.00 Min. :34.00 Min. :43.00
## 1st Qu.:58.75 1st Qu.:58.5 1st Qu.:45.00 1st Qu.:47.00 1st Qu.:58.25
## Median :65.50 Median :65.0 Median :51.50 Median :56.50 Median :63.50
## Mean :64.63 Mean :66.6 Mean :53.13 Mean :56.37 Mean :64.63
## 3rd Qu.:71.75 3rd Qu.:77.0 3rd Qu.:62.50 3rd Qu.:66.75 3rd Qu.:71.00
## Max. :85.00 Max. :90.0 Max. :83.00 Max. :75.00 Max. :88.00
## critical advance
## Min. :49.00 Min. :25.00
## 1st Qu.:69.25 1st Qu.:35.00
## Median :77.50 Median :41.00
## Mean :74.77 Mean :42.93
## 3rd Qu.:80.00 3rd Qu.:47.75
## Max. :92.00 Max. :72.00
This gives us some descriptive statistics and a quick overview of each variable in the data.
To perform factor analysis in R, we need to install and load a few key packages:
These provide the necessary functions.
We first need to calculate the correlation matrix between all variables using the cor() function:
## rating complaints privileges learning raises critical advance
## rating 1.00 0.83 0.43 0.62 0.59 0.16 0.16
## complaints 0.83 1.00 0.56 0.60 0.67 0.19 0.22
## privileges 0.43 0.56 1.00 0.49 0.45 0.15 0.34
## learning 0.62 0.60 0.49 1.00 0.64 0.12 0.53
## raises 0.59 0.67 0.45 0.64 1.00 0.38 0.57
## critical 0.16 0.19 0.15 0.12 0.38 1.00 0.28
## advance 0.16 0.22 0.34 0.53 0.57 0.28 1.00
The correlation matrix shows the relationships between the variables.
Next, we calculate eigenvalues, which indicate the variance explained by each potential factor:
## [1] 3.7163758 1.1409219 0.8471915 0.6128697 0.3236728 0.2185306 0.1404378
We use parallel analysis to determine the optimal number of factors to retain:
## Parallel analysis suggests that the number of factors = 2 and the number of components = 1
## Call: fa.parallel(x = attitude.cor)
## Parallel analysis suggests that the number of factors = 2 and the number of components = 1
##
## Eigen Values of
##
## eigen values of factors
## [1] 3.29 0.52 0.17 0.03 -0.15 -0.23 -0.33
##
## eigen values of simulated factors
## [1] 0.80 0.25 0.14 0.06 -0.04 -0.15 -0.26
##
## eigen values of components
## [1] 3.72 1.14 0.85 0.61 0.32 0.22 0.14
##
## eigen values of simulated components
## [1] 1.40 1.21 1.09 1.00 0.88 0.77 0.65
This suggests retaining 3 factors.
We run the factor analysis using fa() and specify 3 factors, ml method, varimax rotation:
## Factor Analysis using method = ml
## Call: fa(r = attitude, nfactors = 3, rotate = "varimax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML2 ML1 ML3 h2 u2 com
## rating 0.85 0.23 0.06 0.77 0.227 1.2
## complaints 0.93 0.13 0.21 0.92 0.080 1.1
## privileges 0.48 0.26 0.25 0.36 0.639 2.1
## learning 0.50 0.86 0.14 1.00 0.005 1.7
## raises 0.54 0.34 0.59 0.76 0.239 2.6
## critical 0.11 0.00 0.46 0.23 0.771 1.1
## advance 0.02 0.51 0.67 0.70 0.299 1.9
##
## ML2 ML1 ML3
## SS loadings 2.36 1.24 1.14
## Proportion Var 0.34 0.18 0.16
## Cumulative Var 0.34 0.51 0.68
## Proportion Explained 0.50 0.26 0.24
## Cumulative Proportion 0.50 0.76 1.00
##
## Mean item complexity = 1.7
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 21 with the objective function = 3.82 with Chi Square = 98.75
## df of the model are 3 and the objective function was 0.09
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic n.obs is 30 with the empirical chi square 0.75 with prob < 0.86
## The total n.obs was 30 with Likelihood Chi Square = 2.06 with prob < 0.56
##
## Tucker Lewis Index of factoring reliability = 1.094
## RMSEA index = 0 and the 90 % confidence intervals are 0 0.272
## BIC = -8.14
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML2 ML1 ML3
## Correlation of (regression) scores with factors 0.96 0.98 0.85
## Multiple R square of scores with factors 0.92 0.97 0.73
## Minimum correlation of possible factor scores 0.83 0.93 0.45
This gives us the factor loadings to interpret.
We can also view the phi matrix of correlations between factors:
And get the factor scores for each observation:
## ML2 ML1 ML3
## [1,] -1.41538780 -1.06514297 1.00764181
## [2,] -0.24903674 -0.12131694 0.21317826
## [3,] 0.21355477 1.04773346 0.50130425
## [4,] -0.15233625 -0.77386937 -0.44618731
## [5,] 0.88977285 0.41779630 0.15857544
## [6,] -0.88864736 -0.59583364 -0.75371034
## [7,] 0.01302784 0.01953183 -0.44415554
## [8,] 0.66671017 -0.52517923 0.06013209
## [9,] 1.24109552 0.41855305 -0.57665883
## [10,] -0.23597433 -0.81259233 0.15779186
## [11,] -0.63116055 0.69821950 -1.12342096
## [12,] -0.25282024 -1.60734070 0.27346115
## [13,] 0.21579677 -1.37154521 -1.09786725
## [14,] 1.24654698 -1.84535566 0.03642333
## [15,] 0.75579367 1.08311604 0.17541965
## [16,] 1.79193374 0.77753662 -1.70322423
## [17,] 0.88180940 0.52509454 1.43297263
## [18,] -0.71811700 2.30153920 -0.28642631
## [19,] 0.11574260 -0.13339780 0.82117222
## [20,] -0.92822722 0.17469228 0.81369897
## [21,] -1.61939235 -1.13164818 -0.94439383
## [22,] -0.33146109 0.78250797 -0.24608582
## [23,] -0.16790787 -0.55091801 0.07390313
## [24,] -2.32539194 1.61455353 -0.74655460
## [25,] -0.58127256 -0.45846705 -0.22690983
## [26,] 0.12801886 0.17920319 2.66934682
## [27,] 0.60983400 1.34465079 0.31590345
## [28,] -0.81808856 -0.62021006 -0.25793519
## [29,] 1.27092548 0.62674001 0.62444421
## [30,] 1.27465921 -0.39865115 -0.48183922
This allows us to further analyze the factors.
That covers the key steps in performing factor analysis on the attitude dataset in R! Let me know if you have any other questions.
Learn More about Data analysis
Read More: How I Perform Factor Analysis in R
Join our community and stay updated