Factor Analysis in R
library("psych")
library("readxl")
library("REdaS")
## Loading required package: grid
#uploading the data
places <- read.table("C:/1. School/School/4th Year/2nd Semester/Multivariate Data analysis/Activities/Activity 2/places.txt", quote="\"", comment.char="")
head(places)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1 521 6200 237 923 4031 2757 996 1405 7633 1
## 2 575 8138 1656 886 4883 2438 5564 2632 4350 2
## 3 468 7339 618 970 2531 2560 237 859 5250 3
## 4 476 7908 1431 610 6883 3399 4655 1617 5864 4
## 5 659 8393 1853 1483 6558 3026 4496 2612 5727 5
## 6 520 5819 640 727 2444 2972 334 1018 5254 6
Constructing the corresponding Variables to get the dataset that we are performing with
placespc<-places[,1:9]
head(placespc)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 521 6200 237 923 4031 2757 996 1405 7633
## 2 575 8138 1656 886 4883 2438 5564 2632 4350
## 3 468 7339 618 970 2531 2560 237 859 5250
## 4 476 7908 1431 610 6883 3399 4655 1617 5864
## 5 659 8393 1853 1483 6558 3026 4496 2612 5727
## 6 520 5819 640 727 2444 2972 334 1018 5254
Creating a histogram of the dataset
par("mfcol"=c(3, 1))
hist(placespc$V1, col="blue")
hist(placespc$V2, col="green")
hist(placespc$V3, col="red")
hist(placespc$V4, col="blue")
hist(placespc$V5, col="green")
hist(placespc$V6, col="red")
hist(placespc$V7, col="blue")
hist(placespc$V8, col="green")
hist(placespc$V9, col="red")
par("mfcol"=c(1, 1))
Discussion:
As you can see in the histogram above, most of the variables are skewed, having long tails off to the right. For us to normalize the data we will apply a log transformation to each of the variables.
Transforming the dataset using log transformation in base 10 since the data set above is skewed
new.data_places<-log10(placespc)
head(new.data_places)
## V1 V2 V3 V4 V5 V6 V7 V8
## 1 2.716838 3.792392 2.374748 2.965202 3.605413 3.440437 2.998259 3.147676
## 2 2.759668 3.910518 3.219060 2.947434 3.688687 3.387034 3.745387 3.420286
## 3 2.670246 3.865637 2.790988 2.986772 3.403292 3.408240 2.374748 2.933993
## 4 2.677607 3.898067 3.155640 2.785330 3.837778 3.531351 3.667920 3.208710
## 5 2.818885 3.923917 3.267875 3.171141 3.816771 3.480869 3.652826 3.416973
## 6 2.716003 3.764848 2.806180 2.861534 3.388101 3.473049 2.523746 3.007748
## V9
## 1 3.882695
## 2 3.638489
## 3 3.720159
## 4 3.768194
## 5 3.757927
## 6 3.720490
bart_spher(new.data_places)
## Bartlett's Test of Sphericity
##
## Call: bart_spher(x = new.data_places)
##
## X2 = 839.427
## df = 36
## p-value < 2.22e-16
KMO(new.data_places)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = new.data_places)
## Overall MSA = 0.7
## MSA for each item =
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 0.42 0.70 0.73 0.58 0.83 0.73 0.77 0.79 0.40
Performing factor analysis using the factor method Maximum Likelihood analysis and using the “varimax” rotation*
ml<-fa(r=new.data_places,
nfactors=3,
rotate="varimax",
fm="ml",
residuals=TRUE)
ml
## Factor Analysis using method = ml
## Call: fa(r = new.data_places, nfactors = 3, rotate = "varimax", residuals = TRUE,
## fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML2 ML1 ML3 h2 u2 com
## V1 0.11 0.25 0.06 0.077 0.923 1.5
## V2 0.28 0.95 0.11 0.995 0.005 1.2
## V3 0.87 0.18 0.17 0.815 0.185 1.2
## V4 0.09 0.07 0.50 0.258 0.742 1.1
## V5 0.37 0.17 0.48 0.398 0.602 2.1
## V6 0.51 0.06 0.08 0.264 0.736 1.1
## V7 0.61 0.29 0.56 0.771 0.229 2.4
## V8 0.10 0.39 0.56 0.479 0.521 1.9
## V9 -0.03 0.30 0.16 0.117 0.883 1.5
##
## ML2 ML1 ML3
## SS loadings 1.61 1.37 1.19
## Proportion Var 0.18 0.15 0.13
## Cumulative Var 0.18 0.33 0.46
## Proportion Explained 0.39 0.33 0.29
## Cumulative Proportion 0.39 0.71 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.59 with Chi Square of 839.43
## The degrees of freedom for the model are 12 and the objective function was 0.26
##
## The root mean square of the residuals (RMSR) is 0.06
## The df corrected root mean square of the residuals is 0.1
##
## The harmonic number of observations is 329 with the empirical chi square 86.12 with prob < 2.8e-13
## The total number of observations was 329 with Likelihood Chi Square = 82.31 with prob < 1.5e-12
##
## Tucker Lewis Index of factoring reliability = 0.736
## RMSEA index = 0.133 and the 90 % confidence intervals are 0.107 0.162
## BIC = 12.75
## Fit based upon off diagonal values = 0.96
## Measures of factor score adequacy
## ML2 ML1 ML3
## Correlation of (regression) scores with factors 0.89 0.99 0.80
## Multiple R square of scores with factors 0.80 0.98 0.64
## Minimum correlation of possible factor scores 0.60 0.96 0.29
Performing factor analysis using the factor method Principal Axis(PA) analysis and using the “varimax” rotation*
pa<-fa(r=new.data_places,
nfactors=3,
rotate="varimax",
fm="pa",
residuals=TRUE)
## maximum iteration exceeded
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
pa
## Factor Analysis using method = pa
## Call: fa(r = new.data_places, nfactors = 3, rotate = "varimax", residuals = TRUE,
## fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA3 PA2 h2 u2 com
## V1 0.07 0.07 1.05 1.11 -0.11 1.0
## V2 0.36 0.50 0.18 0.41 0.59 2.1
## V3 0.87 0.13 0.09 0.78 0.22 1.1
## V4 0.11 0.44 0.15 0.22 0.78 1.4
## V5 0.47 0.39 -0.03 0.38 0.62 1.9
## V6 0.51 0.05 0.02 0.27 0.73 1.0
## V7 0.68 0.52 0.09 0.74 0.26 1.9
## V8 0.20 0.66 0.07 0.48 0.52 1.2
## V9 0.02 0.37 -0.09 0.15 0.85 1.1
##
## PA1 PA3 PA2
## SS loadings 1.90 1.46 1.19
## Proportion Var 0.21 0.16 0.13
## Cumulative Var 0.21 0.37 0.51
## Proportion Explained 0.42 0.32 0.26
## Cumulative Proportion 0.42 0.74 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 2.59 with Chi Square of 839.43
## The degrees of freedom for the model are 12 and the objective function was 0.3
##
## The root mean square of the residuals (RMSR) is 0.05
## The df corrected root mean square of the residuals is 0.09
##
## The harmonic number of observations is 329 with the empirical chi square 68.23 with prob < 6.9e-10
## The total number of observations was 329 with Likelihood Chi Square = 96.38 with prob < 2.8e-15
##
## Tucker Lewis Index of factoring reliability = 0.683
## RMSEA index = 0.146 and the 90 % confidence intervals are 0.12 0.174
## BIC = 26.83
## Fit based upon off diagonal values = 0.97
Discussion:
As shown in the above result, the cumulative variance of factor analysis using the factor method, Principal Axis Analysis is 0.51 and the cumulative variance of factor analysis using the factor method, Maximum Likelihood is 0.46. This means that using the factor method, Principal Axis Analysis has a bigger cumulative variance compared using the factor method, Maximum Likelihood. So we will use the output from the factor method, Principal Axis Analysis to get the factors.
fa.diagram(pa,main="new.data_placespc")
Interpretation: Note that
V1 is the Climate
V2 is the Housing
V3 is the health
V4 is the Crime
V5 is the Transportation
V6 is the Education
V7 is the Arts
V8 is the Recreation
V9 is the Economics
As you can see in the diagram above,
Factor 1: primarily a measure of Health, but also increases with increasing scores for Transportation, Education, and the Arts.
Factor 2: primarily a measure of Crime, Recreation, the Economy, and Housing.
Factor 3: primarily a measure of Climate alone.