Factor Analysis in R

library("psych")
library("readxl")
library("REdaS")
## Loading required package: grid
#uploading the data
places <- read.table("C:/1. School/School/4th Year/2nd Semester/Multivariate Data analysis/Activities/Activity 2/places.txt", quote="\"", comment.char="")
head(places)
##    V1   V2   V3   V4   V5   V6   V7   V8   V9 V10
## 1 521 6200  237  923 4031 2757  996 1405 7633   1
## 2 575 8138 1656  886 4883 2438 5564 2632 4350   2
## 3 468 7339  618  970 2531 2560  237  859 5250   3
## 4 476 7908 1431  610 6883 3399 4655 1617 5864   4
## 5 659 8393 1853 1483 6558 3026 4496 2612 5727   5
## 6 520 5819  640  727 2444 2972  334 1018 5254   6

Constructing the corresponding Variables to get the dataset that we are performing with

placespc<-places[,1:9]
head(placespc)
##    V1   V2   V3   V4   V5   V6   V7   V8   V9
## 1 521 6200  237  923 4031 2757  996 1405 7633
## 2 575 8138 1656  886 4883 2438 5564 2632 4350
## 3 468 7339  618  970 2531 2560  237  859 5250
## 4 476 7908 1431  610 6883 3399 4655 1617 5864
## 5 659 8393 1853 1483 6558 3026 4496 2612 5727
## 6 520 5819  640  727 2444 2972  334 1018 5254

Creating a histogram of the dataset

par("mfcol"=c(3, 1))
hist(placespc$V1, col="blue")
hist(placespc$V2, col="green")
hist(placespc$V3, col="red")

hist(placespc$V4, col="blue")
hist(placespc$V5, col="green")
hist(placespc$V6, col="red")

hist(placespc$V7, col="blue")
hist(placespc$V8, col="green")
hist(placespc$V9, col="red")

par("mfcol"=c(1, 1))

Discussion:

As you can see in the histogram above, most of the variables are skewed, having long tails off to the right. For us to normalize the data we will apply a log transformation to each of the variables.

Transforming the dataset using log transformation in base 10 since the data set above is skewed

new.data_places<-log10(placespc)
head(new.data_places)
##         V1       V2       V3       V4       V5       V6       V7       V8
## 1 2.716838 3.792392 2.374748 2.965202 3.605413 3.440437 2.998259 3.147676
## 2 2.759668 3.910518 3.219060 2.947434 3.688687 3.387034 3.745387 3.420286
## 3 2.670246 3.865637 2.790988 2.986772 3.403292 3.408240 2.374748 2.933993
## 4 2.677607 3.898067 3.155640 2.785330 3.837778 3.531351 3.667920 3.208710
## 5 2.818885 3.923917 3.267875 3.171141 3.816771 3.480869 3.652826 3.416973
## 6 2.716003 3.764848 2.806180 2.861534 3.388101 3.473049 2.523746 3.007748
##         V9
## 1 3.882695
## 2 3.638489
## 3 3.720159
## 4 3.768194
## 5 3.757927
## 6 3.720490
bart_spher(new.data_places)
##  Bartlett's Test of Sphericity
## 
## Call: bart_spher(x = new.data_places)
## 
##      X2 = 839.427
##      df = 36
## p-value < 2.22e-16
KMO(new.data_places)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = new.data_places)
## Overall MSA =  0.7
## MSA for each item = 
##   V1   V2   V3   V4   V5   V6   V7   V8   V9 
## 0.42 0.70 0.73 0.58 0.83 0.73 0.77 0.79 0.40

Performing factor analysis using the factor method Maximum Likelihood analysis and using the “varimax” rotation*

ml<-fa(r=new.data_places,
       nfactors=3,
       rotate="varimax",
       fm="ml",
       residuals=TRUE)
ml
## Factor Analysis using method =  ml
## Call: fa(r = new.data_places, nfactors = 3, rotate = "varimax", residuals = TRUE, 
##     fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##      ML2  ML1  ML3    h2    u2 com
## V1  0.11 0.25 0.06 0.077 0.923 1.5
## V2  0.28 0.95 0.11 0.995 0.005 1.2
## V3  0.87 0.18 0.17 0.815 0.185 1.2
## V4  0.09 0.07 0.50 0.258 0.742 1.1
## V5  0.37 0.17 0.48 0.398 0.602 2.1
## V6  0.51 0.06 0.08 0.264 0.736 1.1
## V7  0.61 0.29 0.56 0.771 0.229 2.4
## V8  0.10 0.39 0.56 0.479 0.521 1.9
## V9 -0.03 0.30 0.16 0.117 0.883 1.5
## 
##                        ML2  ML1  ML3
## SS loadings           1.61 1.37 1.19
## Proportion Var        0.18 0.15 0.13
## Cumulative Var        0.18 0.33 0.46
## Proportion Explained  0.39 0.33 0.29
## Cumulative Proportion 0.39 0.71 1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.59 with Chi Square of  839.43
## The degrees of freedom for the model are 12  and the objective function was  0.26 
## 
## The root mean square of the residuals (RMSR) is  0.06 
## The df corrected root mean square of the residuals is  0.1 
## 
## The harmonic number of observations is  329 with the empirical chi square  86.12  with prob <  2.8e-13 
## The total number of observations was  329  with Likelihood Chi Square =  82.31  with prob <  1.5e-12 
## 
## Tucker Lewis Index of factoring reliability =  0.736
## RMSEA index =  0.133  and the 90 % confidence intervals are  0.107 0.162
## BIC =  12.75
## Fit based upon off diagonal values = 0.96
## Measures of factor score adequacy             
##                                                    ML2  ML1  ML3
## Correlation of (regression) scores with factors   0.89 0.99 0.80
## Multiple R square of scores with factors          0.80 0.98 0.64
## Minimum correlation of possible factor scores     0.60 0.96 0.29

Performing factor analysis using the factor method Principal Axis(PA) analysis and using the “varimax” rotation*

pa<-fa(r=new.data_places,
       nfactors=3,
       rotate="varimax",
       fm="pa",
       residuals=TRUE)
## maximum iteration exceeded
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
pa
## Factor Analysis using method =  pa
## Call: fa(r = new.data_places, nfactors = 3, rotate = "varimax", residuals = TRUE, 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##     PA1  PA3   PA2   h2    u2 com
## V1 0.07 0.07  1.05 1.11 -0.11 1.0
## V2 0.36 0.50  0.18 0.41  0.59 2.1
## V3 0.87 0.13  0.09 0.78  0.22 1.1
## V4 0.11 0.44  0.15 0.22  0.78 1.4
## V5 0.47 0.39 -0.03 0.38  0.62 1.9
## V6 0.51 0.05  0.02 0.27  0.73 1.0
## V7 0.68 0.52  0.09 0.74  0.26 1.9
## V8 0.20 0.66  0.07 0.48  0.52 1.2
## V9 0.02 0.37 -0.09 0.15  0.85 1.1
## 
##                        PA1  PA3  PA2
## SS loadings           1.90 1.46 1.19
## Proportion Var        0.21 0.16 0.13
## Cumulative Var        0.21 0.37 0.51
## Proportion Explained  0.42 0.32 0.26
## Cumulative Proportion 0.42 0.74 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  36  and the objective function was  2.59 with Chi Square of  839.43
## The degrees of freedom for the model are 12  and the objective function was  0.3 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.09 
## 
## The harmonic number of observations is  329 with the empirical chi square  68.23  with prob <  6.9e-10 
## The total number of observations was  329  with Likelihood Chi Square =  96.38  with prob <  2.8e-15 
## 
## Tucker Lewis Index of factoring reliability =  0.683
## RMSEA index =  0.146  and the 90 % confidence intervals are  0.12 0.174
## BIC =  26.83
## Fit based upon off diagonal values = 0.97

Discussion:

As shown in the above result, the cumulative variance of factor analysis using the factor method, Principal Axis Analysis is 0.51 and the cumulative variance of factor analysis using the factor method, Maximum Likelihood is 0.46. This means that using the factor method, Principal Axis Analysis has a bigger cumulative variance compared using the factor method, Maximum Likelihood. So we will use the output from the factor method, Principal Axis Analysis to get the factors.

fa.diagram(pa,main="new.data_placespc")

Interpretation: Note that

V1 is the Climate

V2 is the Housing

V3 is the health

V4 is the Crime

V5 is the Transportation

V6 is the Education

V7 is the Arts

V8 is the Recreation

V9 is the Economics

As you can see in the diagram above,

Factor 1: primarily a measure of Health, but also increases with increasing scores for Transportation, Education, and the Arts.

Factor 2: primarily a measure of Crime, Recreation, the Economy, and Housing.

Factor 3: primarily a measure of Climate alone.