We will read the dataset present in CSV format into R and store it as a variable.
data<- read.csv(file.choose(),header=TRUE)
head(data)
## Price Safety Exterior_Looks Space_comfort Technology After_Sales_Service
## 1 4 4 5 4 3 4
## 2 3 5 3 3 4 4
## 3 4 4 3 4 5 5
## 4 4 4 4 3 3 4
## 5 5 5 4 4 5 4
## 6 4 4 5 3 4 5
## Resale_Value Fuel_Type Fuel_Efficiency Color Maintenance Test_drive
## 1 5 4 4 2 4 2
## 2 3 4 3 4 3 2
## 3 5 4 5 4 5 4
## 4 5 5 4 4 4 2
## 5 5 3 4 5 5 5
## 6 3 4 3 2 3 2
## Product_reviews Testimonials
## 1 4 3
## 2 2 2
## 3 4 3
## 4 5 3
## 5 5 2
## 6 2 3
Now we’ll install required packages to carry out further analysis. These packages are psych
and GPArotation
. In the code given below we are calling install.packages()
for installation.
##install.packages("psych")
##install.packages("GPArotation")
library(psych)
library(GPArotation)
Next we’ll find out the number of factors that we’ll be selecting for factor analysis. This can be evaluated via methods such as Parallel Analysis
and eigenvalue
, etc.
Parallel Analysis
We’ll be using Psych
package’s fa.parallel
function to execute parallel analysis. Here we specify the data frame and factor method (minres
in our case). Run the following to find acceptable number of factors and generate the scree plot
:
parallel<-fa.parallel(data, fm='minres', fa='fa')
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate =
## rotate, : A loading greater than abs(1) was detected. Examine the loadings
## carefully.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate =
## rotate, : An ultra-Heywood case was detected. Examine the results carefully
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
The blue line shows eigenvalues of actual data and the two red lines (placed on top of each other) show simulated and resampled data. Here we look at the large drops in the actual data and spot the point where it levels off to the right. Also we locate the point of inflection - the point where the gap between simulated data and actual data tends to be minimum.
Looking at this plot and parallel analysis, anywhere between 2 to 5 factors factors would be good choice.
Now that we’ve arrived at probable number number of factors, let’s start off with 3 as the number of factors. In order to perform factor analysis, we’ll use psych
package’s `fa()function. Given below are the arguments we’ll supply:
r - Raw data or correlation or covariance matrix nfactors - Number of factors to extract rotate - Although there are various types rotations, Varimax
and Oblimin
are most popular fm - One of the factor extraction techniques like Minimum Residual (OLS)
, Maximum Liklihood
, Principal Axis
etc. In this case, we will select oblique rotation (rotate = “oblimin”) as we believe that there is correlation in the factors. Note that Varimax rotation is used under the assumption that the factors are completely uncorrelated. We will use Ordinary Least Squared/Minres
factoring (fm = “minres”), as it is known to provide results similar to Maximum Likelihood
without assuming multivariate normal distribution and derives solutions through iterative eigendecomposition like principal axis.
threefactor <- fa(data,nfactors = 3,rotate = "oblimin",fm="minres")
print(threefactor)
## Factor Analysis using method = minres
## Call: fa(r = data, nfactors = 3, rotate = "oblimin", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 MR3 h2 u2 com
## Price 0.44 0.12 -0.19 0.25 0.75 1.5
## Safety -0.23 0.31 -0.11 0.14 0.86 2.1
## Exterior_Looks -0.16 0.18 0.05 0.06 0.94 2.2
## Space_comfort -0.03 0.83 0.04 0.70 0.30 1.0
## Technology 0.09 0.34 0.02 0.13 0.87 1.1
## After_Sales_Service 0.25 0.46 -0.01 0.29 0.71 1.5
## Resale_Value 0.60 -0.16 -0.29 0.48 0.52 1.6
## Fuel_Type 0.03 0.57 -0.13 0.32 0.68 1.1
## Fuel_Efficiency 0.65 0.13 0.16 0.49 0.51 1.2
## Color 0.46 -0.18 0.24 0.27 0.73 1.8
## Maintenance 0.67 0.01 -0.07 0.45 0.55 1.0
## Test_drive 0.19 0.14 0.33 0.19 0.81 2.1
## Product_reviews 0.42 0.13 0.27 0.29 0.71 1.9
## Testimonials -0.03 -0.01 0.74 0.55 0.45 1.0
##
## MR1 MR2 MR3
## SS loadings 2.02 1.61 0.97
## Proportion Var 0.14 0.12 0.07
## Cumulative Var 0.14 0.26 0.33
## Proportion Explained 0.44 0.35 0.21
## Cumulative Proportion 0.44 0.79 1.00
##
## With factor correlations of
## MR1 MR2 MR3
## MR1 1.00 0.07 -0.02
## MR2 0.07 1.00 0.21
## MR3 -0.02 0.21 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 91 and the objective function was 2.97 with Chi Square of 247.71
## The degrees of freedom for the model are 52 and the objective function was 0.84
##
## The root mean square of the residuals (RMSR) is 0.07
## The df corrected root mean square of the residuals is 0.09
##
## The harmonic number of observations is 90 with the empirical chi square 71.69 with prob < 0.036
## The total number of observations was 90 with Likelihood Chi Square = 68.46 with prob < 0.063
##
## Tucker Lewis Index of factoring reliability = 0.809
## RMSEA index = 0.07 and the 90 % confidence intervals are 0 0.095
## BIC = -165.53
## Fit based upon off diagonal values = 0.88
## Measures of factor score adequacy
## MR1 MR2 MR3
## Correlation of (regression) scores with factors 0.88 0.89 0.81
## Multiple R square of scores with factors 0.78 0.79 0.66
## Minimum correlation of possible factor scores 0.55 0.57 0.31
Now we need to consider the loadings more than 0.3 and not loading on more than one factor. Note that negative values are acceptable here. So let’s first establish the cut off to improve visibility:
print(threefactor$loadings,cutoff = 0.3)
##
## Loadings:
## MR1 MR2 MR3
## Price 0.444
## Safety 0.311
## Exterior_Looks
## Space_comfort 0.832
## Technology 0.342
## After_Sales_Service 0.460
## Resale_Value 0.599
## Fuel_Type 0.573
## Fuel_Efficiency 0.655
## Color 0.464
## Maintenance 0.668
## Test_drive 0.328
## Product_reviews 0.424
## Testimonials 0.742
##
## MR1 MR2 MR3
## SS loadings 2.015 1.605 0.972
## Proportion Var 0.144 0.115 0.069
## Cumulative Var 0.144 0.259 0.328
As you can see two variables have become insignificant and two other have double-loading. Next, we’ll consider ‘4’ factors:
fourfactor <- fa(data,nfactors = 4,rotate = "oblimin",fm="minres")
print(fourfactor$loadings,cutoff = 0.3)
##
## Loadings:
## MR1 MR2 MR4 MR3
## Price 0.544
## Safety -0.331 0.358
## Exterior_Looks -0.548
## Space_comfort 0.782
## Technology 0.358
## After_Sales_Service 0.537
## Resale_Value 0.729
## Fuel_Type 0.575
## Fuel_Efficiency 0.434 0.308
## Color 0.731
## Maintenance 0.562
## Test_drive 0.365
## Product_reviews 0.345 0.364
## Testimonials 0.685
##
## MR1 MR2 MR4 MR3
## SS loadings 1.639 1.637 1.053 0.968
## Proportion Var 0.117 0.117 0.075 0.069
## Cumulative Var 0.117 0.234 0.309 0.378
We can see that it results in only single-loading. This is known as simple structure.
Hit the following to look at the factor mapping:
fa.diagram(fourfactor)
Now that we’ve achieved simple structure it’s time for us to validate our model. Let’s look at the factor analysis output to proceed:
The root mean square of residuals (RMSR) is 0.05. This is acceptable as this value should be closer to 0. Next we should check RMSEA (root mean square error of approximation) index. Its value, 0.001 shows good model fit as it’s below 0.05. Finally, the Tucker-Lewis Index (TLI) is 0.93 - an acceptable value considering it’s over 0.9.
After establishing the adequacy of the factors, it’s time for us to name the factors. This is the theoretical side of the analysis where we form the factors depending on the variable loadings. In this case, here is how the factors can be created: Naming the factors
In this tutorial we discussed about the basic idea of EFA, covered parallel analysis and scree plot interpretation. Then we moved to factor analysis to achieve simple structure and validate the same to ensure model’s adequacy. Finally arrived at the names of factor from the variables.