Activity 3: Factor Analysis

We will use the Places Rated Almanac data (Boyer and Savageau) which rates 329 communities according to nine criteria:

Climate and Terrain
Housing
Health Care & Environment
Crime
Transportation
Education
The Arts
Recreation
Economics

We import our data set.

library(psych)
library(readxl)
library(REdaS)

## Warning: package 'REdaS' was built under R version 4.1.3

## Loading required package: grid

places <- read.table("D:/School/4th Year College/Second Semester/Multivariate Data Analysis/Activity/places.txt", quote="\"", comment.char="")
 View(places)

attach(places)

data <- log10(places[,1:9])
head(data)

##         V1       V2       V3       V4       V5       V6       V7       V8
## 1 2.716838 3.792392 2.374748 2.965202 3.605413 3.440437 2.998259 3.147676
## 2 2.759668 3.910518 3.219060 2.947434 3.688687 3.387034 3.745387 3.420286
## 3 2.670246 3.865637 2.790988 2.986772 3.403292 3.408240 2.374748 2.933993
## 4 2.677607 3.898067 3.155640 2.785330 3.837778 3.531351 3.667920 3.208710
## 5 2.818885 3.923917 3.267875 3.171141 3.816771 3.480869 3.652826 3.416973
## 6 2.716003 3.764848 2.806180 2.861534 3.388101 3.473049 2.523746 3.007748
##         V9
## 1 3.882695
## 2 3.638489
## 3 3.720159
## 4 3.768194
## 5 3.757927
## 6 3.720490

Bartletts test of spherecity

bart_spher(places)

##  Bartlett's Test of Sphericity
## 
## Call: bart_spher(x = places)
## 
##      X2 = 1057.162
##      df = 45
## p-value < 2.22e-16

KMO(places)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = places)
## Overall MSA =  0.7
## MSA for each item = 
##   V1   V2   V3   V4   V5   V6   V7   V8   V9  V10 
## 0.56 0.69 0.69 0.64 0.86 0.73 0.71 0.81 0.38 0.65

The sampling adequacy was acceptable with KMO = 0.7 and Barlett’s test of sphericity demonstrated that correlations between items were large enough for factor analysis wiht p-value being significant.

How many factors are going to cOnsider:

fa(r = data, nfactors = 9, rotate = "varimax")

## Factor Analysis using method =  minres
## Call: fa(r = data, nfactors = 9, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##     MR1  MR7   MR2  MR5   MR4   MR3   MR8   MR6 MR9   h2   u2 com
## V1 0.06 0.06 -0.08 0.04  0.13  0.74  0.02 -0.01   0 0.58 0.42 1.1
## V2 0.54 0.32  0.36 0.05 -0.15  0.35  0.41  0.04   0 0.84 0.16 4.5
## V3 0.21 0.77 -0.01 0.40  0.08  0.08  0.09 -0.01   0 0.81 0.19 1.8
## V4 0.20 0.09  0.18 0.01  0.77  0.17 -0.02  0.01   0 0.70 0.30 1.4
## V5 0.52 0.18 -0.08 0.38  0.24 -0.10  0.12  0.24   0 0.60 0.40 3.5
## V6 0.06 0.24  0.10 0.66 -0.01  0.06  0.00  0.00   0 0.51 0.49 1.4
## V7 0.57 0.58  0.05 0.25  0.20  0.10 -0.06  0.10   0 0.78 0.22 2.8
## V8 0.69 0.11  0.12 0.01  0.15  0.08  0.01 -0.05   0 0.53 0.47 1.3
## V9 0.09 0.00  0.80 0.08  0.16 -0.09  0.03 -0.01   0 0.69 0.31 1.2
## 
##                        MR1  MR7  MR2  MR5  MR4  MR3  MR8  MR6  MR9
## SS loadings           1.46 1.15 0.84 0.81 0.78 0.74 0.20 0.07 0.00
## Proportion Var        0.16 0.13 0.09 0.09 0.09 0.08 0.02 0.01 0.00
## Cumulative Var        0.16 0.29 0.38 0.47 0.56 0.64 0.66 0.67 0.67
## Proportion Explained  0.24 0.19 0.14 0.13 0.13 0.12 0.03 0.01 0.00
## Cumulative Proportion 0.24 0.43 0.57 0.70 0.83 0.95 0.99 1.00 1.00
## 
## Mean item complexity =  2.1
## Test of the hypothesis that 9 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.59 with Chi Square of  839.43
## The degrees of freedom for the model are -9  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## The harmonic number of observations is  329 with the empirical chi square  0  with prob <  NA 
## The total number of observations was  329  with Likelihood Chi Square =  0  with prob <  NA 
## 
## Tucker Lewis Index of factoring reliability =  1.046
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR7  MR2  MR5  MR4  MR3
## Correlation of (regression) scores with factors   0.82 0.82 0.84 0.72 0.82 0.79
## Multiple R square of scores with factors          0.68 0.67 0.70 0.52 0.68 0.62
## Minimum correlation of possible factor scores     0.36 0.34 0.40 0.04 0.35 0.24
##                                                     MR8   MR6 MR9
## Correlation of (regression) scores with factors    0.58  0.33   0
## Multiple R square of scores with factors           0.34  0.11   0
## Minimum correlation of possible factor scores     -0.33 -0.78  -1

Observing the ss loadings which suggest the eigenvalue of each factor, we have seen a 3 significant values; hence, we use 3 factors.

Here, we compare the principal axis factor analysis to the maximum likelihood.

pa <- fa(r = data, nfactors = 3, rotate = "varimax", fm = "pa", residuals = TRUE)

## maximum iteration exceeded

## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.

## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully

pa

## Factor Analysis using method =  pa
## Call: fa(r = data, nfactors = 3, rotate = "varimax", residuals = TRUE, 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##     PA1  PA3   PA2   h2    u2 com
## V1 0.07 0.07  1.05 1.11 -0.11 1.0
## V2 0.36 0.50  0.18 0.41  0.59 2.1
## V3 0.87 0.13  0.09 0.78  0.22 1.1
## V4 0.11 0.44  0.15 0.22  0.78 1.4
## V5 0.47 0.39 -0.03 0.38  0.62 1.9
## V6 0.51 0.05  0.02 0.27  0.73 1.0
## V7 0.68 0.52  0.09 0.74  0.26 1.9
## V8 0.20 0.66  0.07 0.48  0.52 1.2
## V9 0.02 0.37 -0.09 0.15  0.85 1.1
## 
##                        PA1  PA3  PA2
## SS loadings           1.90 1.46 1.19
## Proportion Var        0.21 0.16 0.13
## Cumulative Var        0.21 0.37 0.51
## Proportion Explained  0.42 0.32 0.26
## Cumulative Proportion 0.42 0.74 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.59 with Chi Square of  839.43
## The degrees of freedom for the model are 12  and the objective function was  0.3 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.09 
## 
## The harmonic number of observations is  329 with the empirical chi square  68.23  with prob <  6.9e-10 
## The total number of observations was  329  with Likelihood Chi Square =  96.38  with prob <  2.8e-15 
## 
## Tucker Lewis Index of factoring reliability =  0.683
## RMSEA index =  0.146  and the 90 % confidence intervals are  0.12 0.174
## BIC =  26.83
## Fit based upon off diagonal values = 0.97

ml <- fa(r = data, nfactors = 3, rotate = "varimax", fm = "ml", residuals = TRUE)

ml

## Factor Analysis using method =  ml
## Call: fa(r = data, nfactors = 3, rotate = "varimax", residuals = TRUE, 
##     fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      ML2  ML1  ML3    h2    u2 com
## V1  0.11 0.25 0.06 0.077 0.923 1.5
## V2  0.28 0.95 0.11 0.995 0.005 1.2
## V3  0.87 0.18 0.17 0.815 0.185 1.2
## V4  0.09 0.07 0.50 0.258 0.742 1.1
## V5  0.37 0.17 0.48 0.398 0.602 2.1
## V6  0.51 0.06 0.08 0.264 0.736 1.1
## V7  0.61 0.29 0.56 0.771 0.229 2.4
## V8  0.10 0.39 0.56 0.479 0.521 1.9
## V9 -0.03 0.30 0.16 0.117 0.883 1.5
## 
##                        ML2  ML1  ML3
## SS loadings           1.61 1.37 1.19
## Proportion Var        0.18 0.15 0.13
## Cumulative Var        0.18 0.33 0.46
## Proportion Explained  0.39 0.33 0.29
## Cumulative Proportion 0.39 0.71 1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.59 with Chi Square of  839.43
## The degrees of freedom for the model are 12  and the objective function was  0.26 
## 
## The root mean square of the residuals (RMSR) is  0.06 
## The df corrected root mean square of the residuals is  0.1 
## 
## The harmonic number of observations is  329 with the empirical chi square  86.12  with prob <  2.8e-13 
## The total number of observations was  329  with Likelihood Chi Square =  82.31  with prob <  1.5e-12 
## 
## Tucker Lewis Index of factoring reliability =  0.736
## RMSEA index =  0.133  and the 90 % confidence intervals are  0.107 0.162
## BIC =  12.75
## Fit based upon off diagonal values = 0.96
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3
## Correlation of (regression) scores with factors   0.89 0.99 0.80
## Multiple R square of scores with factors          0.80 0.98 0.64
## Minimum correlation of possible factor scores     0.60 0.96 0.29

As we can observe on the result, the cumulative variance for principal axis is 0.51 while the cumulative variance for maximum likelihood is 0.46. Since the principal axis is larger than the cumulative variance for maximum likelihood, we will consider the result for principal axis.

Now, considering the principal axis, observe the following (observing which factor does the variable belong by observing which value is the highest among the 3 factors):

Note: We will exclude (V10) since it is not included on the feature that need to be rated. Also, this column is only a sequential number that indicated the number of places rated.

PA1 is related to health care and environment (V3), transportation (V5), education (V6) and the arts (V7);

PA2 is related to climate and terrain (V1);and

PA3 is related to housing (V2), crime (V4), and economics (V9).

Moreover, the column h2 explains the amount of variance in each variable. Here, as we can observe, climate and terrain (V1) has the highest variability with 1.1129836. So, we can say that with the given data and using the factor analysis, the best feature offered in this communities is the climate and terrain. On the contrary, as we can see in u2 column, it is the difference of 1 - h2. Hence, it suggest that the higher the h2 value we get a lower u2 value and the lower the h2 value the higher the u2 value, which implies that when we get a higher value for u2, it is the feature with least variability. Here, we see that the health care and environment (V3) is not explained well in the factor analysis with u2 = 0.2169270.

pa <- fa(r = data, nfactors = 3, rotate = "varimax", fm = "pa", residuals = TRUE)

## maximum iteration exceeded

## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.

## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully

fa.diagram(pa,main="data")

Activity 3: Factor Analysis

Darlene Dale Lumayot

2022-03-29