We will use the Places Rated Almanac data (Boyer and Savageau) which rates 329 communities according to nine criteria:
We import our data set.
library(psych)
library(readxl)
library(REdaS)
## Warning: package 'REdaS' was built under R version 4.1.3
## Loading required package: grid
places <- read.table("D:/School/4th Year College/Second Semester/Multivariate Data Analysis/Activity/places.txt", quote="\"", comment.char="")
View(places)
attach(places)
data <- log10(places[,1:9])
head(data)
## V1 V2 V3 V4 V5 V6 V7 V8
## 1 2.716838 3.792392 2.374748 2.965202 3.605413 3.440437 2.998259 3.147676
## 2 2.759668 3.910518 3.219060 2.947434 3.688687 3.387034 3.745387 3.420286
## 3 2.670246 3.865637 2.790988 2.986772 3.403292 3.408240 2.374748 2.933993
## 4 2.677607 3.898067 3.155640 2.785330 3.837778 3.531351 3.667920 3.208710
## 5 2.818885 3.923917 3.267875 3.171141 3.816771 3.480869 3.652826 3.416973
## 6 2.716003 3.764848 2.806180 2.861534 3.388101 3.473049 2.523746 3.007748
## V9
## 1 3.882695
## 2 3.638489
## 3 3.720159
## 4 3.768194
## 5 3.757927
## 6 3.720490
Bartletts test of spherecity
bart_spher(places)
## Bartlett's Test of Sphericity
##
## Call: bart_spher(x = places)
##
## X2 = 1057.162
## df = 45
## p-value < 2.22e-16
KMO(places)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = places)
## Overall MSA = 0.7
## MSA for each item =
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 0.56 0.69 0.69 0.64 0.86 0.73 0.71 0.81 0.38 0.65
The sampling adequacy was acceptable with KMO = 0.7 and Barlett’s test of sphericity demonstrated that correlations between items were large enough for factor analysis wiht p-value being significant.
How many factors are going to cOnsider:
fa(r = data, nfactors = 9, rotate = "varimax")
## Factor Analysis using method = minres
## Call: fa(r = data, nfactors = 9, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR7 MR2 MR5 MR4 MR3 MR8 MR6 MR9 h2 u2 com
## V1 0.06 0.06 -0.08 0.04 0.13 0.74 0.02 -0.01 0 0.58 0.42 1.1
## V2 0.54 0.32 0.36 0.05 -0.15 0.35 0.41 0.04 0 0.84 0.16 4.5
## V3 0.21 0.77 -0.01 0.40 0.08 0.08 0.09 -0.01 0 0.81 0.19 1.8
## V4 0.20 0.09 0.18 0.01 0.77 0.17 -0.02 0.01 0 0.70 0.30 1.4
## V5 0.52 0.18 -0.08 0.38 0.24 -0.10 0.12 0.24 0 0.60 0.40 3.5
## V6 0.06 0.24 0.10 0.66 -0.01 0.06 0.00 0.00 0 0.51 0.49 1.4
## V7 0.57 0.58 0.05 0.25 0.20 0.10 -0.06 0.10 0 0.78 0.22 2.8
## V8 0.69 0.11 0.12 0.01 0.15 0.08 0.01 -0.05 0 0.53 0.47 1.3
## V9 0.09 0.00 0.80 0.08 0.16 -0.09 0.03 -0.01 0 0.69 0.31 1.2
##
## MR1 MR7 MR2 MR5 MR4 MR3 MR8 MR6 MR9
## SS loadings 1.46 1.15 0.84 0.81 0.78 0.74 0.20 0.07 0.00
## Proportion Var 0.16 0.13 0.09 0.09 0.09 0.08 0.02 0.01 0.00
## Cumulative Var 0.16 0.29 0.38 0.47 0.56 0.64 0.66 0.67 0.67
## Proportion Explained 0.24 0.19 0.14 0.13 0.13 0.12 0.03 0.01 0.00
## Cumulative Proportion 0.24 0.43 0.57 0.70 0.83 0.95 0.99 1.00 1.00
##
## Mean item complexity = 2.1
## Test of the hypothesis that 9 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.59 with Chi Square of 839.43
## The degrees of freedom for the model are -9 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is NA
##
## The harmonic number of observations is 329 with the empirical chi square 0 with prob < NA
## The total number of observations was 329 with Likelihood Chi Square = 0 with prob < NA
##
## Tucker Lewis Index of factoring reliability = 1.046
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR7 MR2 MR5 MR4 MR3
## Correlation of (regression) scores with factors 0.82 0.82 0.84 0.72 0.82 0.79
## Multiple R square of scores with factors 0.68 0.67 0.70 0.52 0.68 0.62
## Minimum correlation of possible factor scores 0.36 0.34 0.40 0.04 0.35 0.24
## MR8 MR6 MR9
## Correlation of (regression) scores with factors 0.58 0.33 0
## Multiple R square of scores with factors 0.34 0.11 0
## Minimum correlation of possible factor scores -0.33 -0.78 -1
Observing the ss loadings which suggest the eigenvalue of each factor, we have seen a 3 significant values; hence, we use 3 factors.
Here, we compare the principal axis factor analysis to the maximum likelihood.
pa <- fa(r = data, nfactors = 3, rotate = "varimax", fm = "pa", residuals = TRUE)
## maximum iteration exceeded
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
pa
## Factor Analysis using method = pa
## Call: fa(r = data, nfactors = 3, rotate = "varimax", residuals = TRUE,
## fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA3 PA2 h2 u2 com
## V1 0.07 0.07 1.05 1.11 -0.11 1.0
## V2 0.36 0.50 0.18 0.41 0.59 2.1
## V3 0.87 0.13 0.09 0.78 0.22 1.1
## V4 0.11 0.44 0.15 0.22 0.78 1.4
## V5 0.47 0.39 -0.03 0.38 0.62 1.9
## V6 0.51 0.05 0.02 0.27 0.73 1.0
## V7 0.68 0.52 0.09 0.74 0.26 1.9
## V8 0.20 0.66 0.07 0.48 0.52 1.2
## V9 0.02 0.37 -0.09 0.15 0.85 1.1
##
## PA1 PA3 PA2
## SS loadings 1.90 1.46 1.19
## Proportion Var 0.21 0.16 0.13
## Cumulative Var 0.21 0.37 0.51
## Proportion Explained 0.42 0.32 0.26
## Cumulative Proportion 0.42 0.74 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.59 with Chi Square of 839.43
## The degrees of freedom for the model are 12 and the objective function was 0.3
##
## The root mean square of the residuals (RMSR) is 0.05
## The df corrected root mean square of the residuals is 0.09
##
## The harmonic number of observations is 329 with the empirical chi square 68.23 with prob < 6.9e-10
## The total number of observations was 329 with Likelihood Chi Square = 96.38 with prob < 2.8e-15
##
## Tucker Lewis Index of factoring reliability = 0.683
## RMSEA index = 0.146 and the 90 % confidence intervals are 0.12 0.174
## BIC = 26.83
## Fit based upon off diagonal values = 0.97
ml <- fa(r = data, nfactors = 3, rotate = "varimax", fm = "ml", residuals = TRUE)
ml
## Factor Analysis using method = ml
## Call: fa(r = data, nfactors = 3, rotate = "varimax", residuals = TRUE,
## fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML2 ML1 ML3 h2 u2 com
## V1 0.11 0.25 0.06 0.077 0.923 1.5
## V2 0.28 0.95 0.11 0.995 0.005 1.2
## V3 0.87 0.18 0.17 0.815 0.185 1.2
## V4 0.09 0.07 0.50 0.258 0.742 1.1
## V5 0.37 0.17 0.48 0.398 0.602 2.1
## V6 0.51 0.06 0.08 0.264 0.736 1.1
## V7 0.61 0.29 0.56 0.771 0.229 2.4
## V8 0.10 0.39 0.56 0.479 0.521 1.9
## V9 -0.03 0.30 0.16 0.117 0.883 1.5
##
## ML2 ML1 ML3
## SS loadings 1.61 1.37 1.19
## Proportion Var 0.18 0.15 0.13
## Cumulative Var 0.18 0.33 0.46
## Proportion Explained 0.39 0.33 0.29
## Cumulative Proportion 0.39 0.71 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.59 with Chi Square of 839.43
## The degrees of freedom for the model are 12 and the objective function was 0.26
##
## The root mean square of the residuals (RMSR) is 0.06
## The df corrected root mean square of the residuals is 0.1
##
## The harmonic number of observations is 329 with the empirical chi square 86.12 with prob < 2.8e-13
## The total number of observations was 329 with Likelihood Chi Square = 82.31 with prob < 1.5e-12
##
## Tucker Lewis Index of factoring reliability = 0.736
## RMSEA index = 0.133 and the 90 % confidence intervals are 0.107 0.162
## BIC = 12.75
## Fit based upon off diagonal values = 0.96
## Measures of factor score adequacy
## ML2 ML1 ML3
## Correlation of (regression) scores with factors 0.89 0.99 0.80
## Multiple R square of scores with factors 0.80 0.98 0.64
## Minimum correlation of possible factor scores 0.60 0.96 0.29
As we can observe on the result, the cumulative variance for principal axis is 0.51 while the cumulative variance for maximum likelihood is 0.46. Since the principal axis is larger than the cumulative variance for maximum likelihood, we will consider the result for principal axis.
Now, considering the principal axis, observe the following (observing which factor does the variable belong by observing which value is the highest among the 3 factors):
Note: We will exclude (V10) since it is not included on the feature that need to be rated. Also, this column is only a sequential number that indicated the number of places rated.
PA1 is related to health care and environment (V3), transportation (V5), education (V6) and the arts (V7);
PA2 is related to climate and terrain (V1);and
PA3 is related to housing (V2), crime (V4), and economics (V9).
Moreover, the column h2 explains the amount of variance in each variable. Here, as we can observe, climate and terrain (V1) has the highest variability with 1.1129836. So, we can say that with the given data and using the factor analysis, the best feature offered in this communities is the climate and terrain. On the contrary, as we can see in u2 column, it is the difference of 1 - h2. Hence, it suggest that the higher the h2 value we get a lower u2 value and the lower the h2 value the higher the u2 value, which implies that when we get a higher value for u2, it is the feature with least variability. Here, we see that the health care and environment (V3) is not explained well in the factor analysis with u2 = 0.2169270.
pa <- fa(r = data, nfactors = 3, rotate = "varimax", fm = "pa", residuals = TRUE)
## maximum iteration exceeded
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
fa.diagram(pa,main="data")