places <- read.table("D:/MARV BS MATH/4th year, 2nd sem/Multivariate/places.txt", header=FALSE, sep = '')
paged_table(places)
We look at the dataset before we run any analysis.
a <- describe(places)
paged_table(a)
We use the dim function to retrieve the dimension of the dataset.
dim(places)
[1] 329 10
In our data frame, we have a V10 variable in the last column. So, we can use -10 in the column index to remove the last column and save our data to a new object.
place <- places[ , -10]
head(place)
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 521 6200 237 923 4031 2757 996 1405 7633
2 575 8138 1656 886 4883 2438 5564 2632 4350
3 468 7339 618 970 2531 2560 237 859 5250
4 476 7908 1431 610 6883 3399 4655 1617 5864
5 659 8393 1853 1483 6558 3026 4496 2612 5727
6 520 5819 640 727 2444 2972 334 1018 5254
We should take a look at the correlations among our variables to determine if factor analysis is appropriate.
datamatrix <- cor(place[,c(-10)])
corrplot(datamatrix, method="number")
X <- place[,-c(10)]
Y <- place[,9]
The Kaiser-Meyer-Olkin (KMO) used to measure sampling adequacy is a better measure of factorability.
KMO(r=cor(X))
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cor(X))
Overall MSA = 0.7
MSA for each item =
V1 V2 V3 V4 V5 V6 V7 V8 V9
0.55 0.69 0.69 0.64 0.86 0.74 0.71 0.81 0.38
The total KMO is 0.7, indicating that, based on this test, we can probably conduct a factor analysis.
cortest.bartlett(X)
R was not square, finding R from data
$chisq
[1] 1051.614
$p.value
[1] 2.302138e-197
$df
[1] 36
Small values (2.30e-197 < 0.05) of the significance level indicate that a factor analysis may be useful with our data.
det(cor(X))
[1] 0.03900546
We have a positive determinant, which means the factor analysis will probably run.
fafitfree <- fa(place,nfactors = ncol(X), rotate = "none")
n_factors <- length(fafitfree$e.values)
scree <- data.frame(
Factor_n = as.factor(1:n_factors),
Eigenvalue = fafitfree$e.values)
ggplot(scree, aes(x = Factor_n, y = Eigenvalue, group = 1)) +
geom_point() + geom_line() +
xlab("Number of factors") +
ylab("Initial eigenvalue") +
labs( title = "Scree Plot",
subtitle = "(Based on the unreduced correlation matrix)")
parallel <- fa.parallel(X)
Parallel analysis suggests that the number of factors = 5 and the number of components = 1
fa.none <- fa(r=X,
nfactors = 5,
# covar = FALSE, SMC = TRUE,
fm="pa", # type of factor analysis we want to use ("pa" is principal axis factoring)
max.iter=100, # (50 is the default, but we have changed it to 100
rotate="varimax") # none rotation
maximum iteration exceeded
print(fa.none)
Factor Analysis using method = pa
Call: fa(r = X, nfactors = 5, rotate = "varimax", max.iter = 100, fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
PA1 PA3 PA2 PA4 PA5 h2 u2 com
V1 0.08 0.39 -0.15 -0.02 0.22 0.23 0.771 2.0
V2 0.24 1.02 0.21 0.11 -0.01 1.15 -0.150 1.2
V3 0.98 0.21 -0.03 0.09 0.23 1.06 -0.060 1.2
V4 0.13 0.07 0.13 0.09 0.71 0.55 0.448 1.2
V5 0.33 0.08 0.00 0.95 0.21 1.06 -0.056 1.4
V6 0.49 0.04 0.09 0.18 -0.04 0.28 0.718 1.4
V7 0.73 0.26 -0.03 0.14 0.38 0.76 0.243 1.9
V8 0.14 0.35 0.08 0.21 0.40 0.35 0.649 2.9
V9 0.05 0.04 1.02 0.01 0.16 1.07 -0.070 1.1
PA1 PA3 PA2 PA4 PA5
SS loadings 1.93 1.44 1.14 1.02 0.98
Proportion Var 0.21 0.16 0.13 0.11 0.11
Cumulative Var 0.21 0.37 0.50 0.61 0.72
Proportion Explained 0.30 0.22 0.17 0.16 0.15
Cumulative Proportion 0.30 0.52 0.69 0.85 1.00
Mean item complexity = 1.6
Test of the hypothesis that 5 factors are sufficient.
df null model = 36 with the objective function = 3.24 with Chi Square = 1051.61
df of the model are 1 and the objective function was 0.01
The root mean square of the residuals (RMSR) is 0.01
The df corrected root mean square of the residuals is 0.06
The harmonic n.obs is 329 with the empirical chi square 2.47 with prob < 0.12
The total n.obs was 329 with Likelihood Chi Square = 4.34 with prob < 0.037
Tucker Lewis Index of factoring reliability = 0.881
RMSEA index = 0.101 and the 90 % confidence intervals are 0.02 0.206
BIC = -1.46
Fit based upon off diagonal values = 1
factanal.none <- factanal(X, factors=5, scores = c("regression"), rotation = "varimax")
print(factanal.none)
Call:
factanal(x = X, factors = 5, scores = c("regression"), rotation = "varimax")
Uniquenesses:
V1 V2 V3 V4 V5 V6 V7 V8 V9
0.737 0.005 0.005 0.411 0.005 0.691 0.206 0.646 0.005
Loadings:
Factor1 Factor2 Factor3 Factor4 Factor5
V1 0.450 -0.150 0.183
V2 0.250 0.926 0.239 0.122
V3 0.939 0.240 0.105 0.212
V4 0.125 0.102 0.130 0.104 0.731
V5 0.332 0.916 0.191
V6 0.509 0.100 0.191
V7 0.753 0.291 0.131 0.351
V8 0.146 0.380 0.232 0.355
V9 0.982 0.166
Factor1 Factor2 Factor3 Factor4 Factor5
SS loadings 1.925 1.368 1.081 0.985 0.931
Proportion Var 0.214 0.152 0.120 0.109 0.103
Cumulative Var 0.214 0.366 0.486 0.595 0.699
Test of the hypothesis that 5 factors are sufficient.
The chi square statistic is 5.59 on 1 degree of freedom.
The p-value is 0.018
fa.diagram(fa.none)