Data Overview

# Dataset

library(base)
places <- read.table("C:/Users/63906/Downloads/places.txt")
paged_table(places)

Describing the data

A <- describe(places)
paged_table(A)

Use the dim function to retrieve the dimension of the dataset.

dim(places)
[1] 329  10

Cleaning data

In our data frame, we have a V10 variable in the last column. So, we can use -10 in the column index to remove the last column and save our data to a new object.

place <- places[ , -10] 
head(place)
   V1   V2   V3   V4   V5   V6   V7   V8   V9
1 521 6200  237  923 4031 2757  996 1405 7633
2 575 8138 1656  886 4883 2438 5564 2632 4350
3 468 7339  618  970 2531 2560  237  859 5250
4 476 7908 1431  610 6883 3399 4655 1617 5864
5 659 8393 1853 1483 6558 3026 4496 2612 5727
6 520 5819  640  727 2444 2972  334 1018 5254

Correlation matrix

We also should take a look at the correlations among our variables to determine if factor analysis is appropriate.

datamatrix <- cor(place[,c(-10)])
corrplot(datamatrix, method="number")

The Factorability of the Data

X <- place[,-c(10)]
Y <- place[,9]

KMO (Kaiser-Meyer-Olkin)

The Kaiser-Meyer-Olkin (KMO) used to measure sampling adequacy is a better measure of factorability.

KMO(r=cor(X))
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cor(X))
Overall MSA =  0.7
MSA for each item = 
  V1   V2   V3   V4   V5   V6   V7   V8   V9 
0.55 0.69 0.69 0.64 0.86 0.74 0.71 0.81 0.38 

The total KMO is 0.7, indicating that, based on this test, we can probably conduct a factor analysis.

Bartlett’s Test of Sphericity

cortest.bartlett(X)
R was not square, finding R from data
$chisq
[1] 1051.614

$p.value
[1] 2.302138e-197

$df
[1] 36

Small values (2.30e-197 < 0.05) of the significance level indicate that a factor analysis may be useful with our data.

det(cor(X))
[1] 0.03900546

We have a positive determinant, which means the factor analysis will probably run.

The Number of Factors to Extract

Scree Pilot

fafitfree <- fa(place,nfactors = ncol(X), rotate = "none")
n_factors <- length(fafitfree$e.values)
scree     <- data.frame(
  Factor_n =  as.factor(1:n_factors), 
  Eigenvalue = fafitfree$e.values)
ggplot(scree, aes(x = Factor_n, y = Eigenvalue, group = 1)) + 
  geom_point() + geom_line() +
  xlab("Number of factors") +
  ylab("Initial eigenvalue") +
  labs( title = "Scree Plot", 
        subtitle = "(Based on the unreduced correlation matrix)")

Parallel Analysis

parallel <- fa.parallel(X)

Parallel analysis suggests that the number of factors =  5  and the number of components =  1 

Conducting the Factor Analysis

Factor Analysis Using fa Method
fa.none <- fa(r=X, 
 nfactors = 5, 
 # covar = FALSE, SMC = TRUE,
 fm="pa", # type of factor analysis we want to use ("pa" is principal axis factoring)
 max.iter=100, # (50 is the default, but we have changed it to 100
 rotate="varimax") # none rotation
maximum iteration exceeded
Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
The estimated weights for the factor scores are probably incorrect. Try a
different factor score estimation method.
Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
ultra-Heywood case was detected. Examine the results carefully
print(fa.none)
Factor Analysis using method =  pa
Call: fa(r = X, nfactors = 5, rotate = "varimax", max.iter = 100, fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
    PA1  PA3   PA2   PA4   PA5   h2     u2 com
V1 0.08 0.39 -0.15 -0.02  0.22 0.23  0.771 2.0
V2 0.24 1.02  0.21  0.11 -0.01 1.15 -0.150 1.2
V3 0.98 0.21 -0.03  0.09  0.23 1.06 -0.060 1.2
V4 0.13 0.07  0.13  0.09  0.71 0.55  0.448 1.2
V5 0.33 0.08  0.00  0.95  0.21 1.06 -0.056 1.4
V6 0.49 0.04  0.09  0.18 -0.04 0.28  0.718 1.4
V7 0.73 0.26 -0.03  0.14  0.38 0.76  0.243 1.9
V8 0.14 0.35  0.08  0.21  0.40 0.35  0.649 2.9
V9 0.05 0.04  1.02  0.01  0.16 1.07 -0.070 1.1

                       PA1  PA3  PA2  PA4  PA5
SS loadings           1.93 1.44 1.14 1.02 0.98
Proportion Var        0.21 0.16 0.13 0.11 0.11
Cumulative Var        0.21 0.37 0.50 0.61 0.72
Proportion Explained  0.30 0.22 0.17 0.16 0.15
Cumulative Proportion 0.30 0.52 0.69 0.85 1.00

Mean item complexity =  1.6
Test of the hypothesis that 5 factors are sufficient.

df null model =  36  with the objective function =  3.24 with Chi Square =  1051.61
df of  the model are 1  and the objective function was  0.01 

The root mean square of the residuals (RMSR) is  0.01 
The df corrected root mean square of the residuals is  0.06 

The harmonic n.obs is  329 with the empirical chi square  2.47  with prob <  0.12 
The total n.obs was  329  with Likelihood Chi Square =  4.34  with prob <  0.037 

Tucker Lewis Index of factoring reliability =  0.881
RMSEA index =  0.101  and the 90 % confidence intervals are  0.02 0.206
BIC =  -1.46
Fit based upon off diagonal values = 1
Factor Analysis Using the Factanal Method
factanal.none <- factanal(X, factors=5, scores = c("regression"), rotation = "varimax")
print(factanal.none)

Call:
factanal(x = X, factors = 5, scores = c("regression"), rotation = "varimax")

Uniquenesses:
   V1    V2    V3    V4    V5    V6    V7    V8    V9 
0.737 0.005 0.005 0.411 0.005 0.691 0.206 0.646 0.005 

Loadings:
   Factor1 Factor2 Factor3 Factor4 Factor5
V1          0.450  -0.150           0.183 
V2  0.250   0.926   0.239   0.122         
V3  0.939   0.240           0.105   0.212 
V4  0.125   0.102   0.130   0.104   0.731 
V5  0.332                   0.916   0.191 
V6  0.509           0.100   0.191         
V7  0.753   0.291           0.131   0.351 
V8  0.146   0.380           0.232   0.355 
V9                  0.982           0.166 

               Factor1 Factor2 Factor3 Factor4 Factor5
SS loadings      1.925   1.368   1.081   0.985   0.931
Proportion Var   0.214   0.152   0.120   0.109   0.103
Cumulative Var   0.214   0.366   0.486   0.595   0.699

Test of the hypothesis that 5 factors are sufficient.
The chi square statistic is 5.59 on 1 degree of freedom.
The p-value is 0.018 
Graph Factor Loading Matrices
fa.diagram(fa.none)