Priave domestic airline operators raised their airfare in Dec 2000. Indian Airlines, the government owned carrier, was the only domestic airline which did not follow suit. Information available showd that people still prefer to fly Jet Airways. Our group set out to ascertain the reasons for the above preference, as according to Indian Airlines officials domestic flyers are price-conscious customers.
Our study consisted of 20 respondents who had recently flown with Jet Airways. they were asked to indicate on a seven point scale( 1= completely agree, 7= completely disagree), their agreement or disagreement with the set of 10 statemnets relating to their perceptions and attributes of the airlines.
The 10 statements were as follows:
Reading Data
raqData <- read.csv("Airlines.csv",header = TRUE)
Calculate the correlation matrix
raqmatrix <- round(cor(raqData),2)
raqmatrix
## Ontime Comfort Food Hostess Boss Airyoung Freqfly Schedule
## Ontime 1.00 0.14 0.87 -0.06 0.40 0.90 -0.02 0.07
## Comfort 0.14 1.00 -0.05 0.19 -0.16 0.06 0.08 0.92
## Food 0.87 -0.05 1.00 0.00 0.36 0.85 0.05 -0.13
## Hostess -0.06 0.19 0.00 1.00 0.08 -0.10 0.95 0.26
## Boss 0.40 -0.16 0.36 0.08 1.00 0.50 0.07 -0.25
## Airyoung 0.90 0.06 0.85 -0.10 0.50 1.00 -0.05 -0.08
## Freqfly -0.02 0.08 0.05 0.95 0.07 -0.05 1.00 0.16
## Schedule 0.07 0.92 -0.13 0.26 -0.25 -0.08 0.16 1.00
## Mom -0.22 -0.07 -0.03 -0.32 -0.14 -0.11 -0.26 -0.13
## Lifestyl 0.01 0.17 0.04 0.89 0.05 -0.03 0.97 0.26
## Mom Lifestyl
## Ontime -0.22 0.01
## Comfort -0.07 0.17
## Food -0.03 0.04
## Hostess -0.32 0.89
## Boss -0.14 0.05
## Airyoung -0.11 -0.03
## Freqfly -0.26 0.97
## Schedule -0.13 0.26
## Mom 1.00 -0.23
## Lifestyl -0.23 1.00
Run Barlette Test using psych package
library(psych)
## Warning: package 'psych' was built under R version 3.6.1
cortest.bartlett(raqData)
## R was not square, finding R from data
## $chisq
## [1] 194.4825
##
## $p.value
## [1] 1.727031e-20
##
## $df
## [1] 45
The test data is highly significant and therefore factor analysis is appropriate
KMO Test
# KMO Kaiser-Meyer-Olkin Measure of Sampling Adequacy
# Function by G. Jay Kerns, Ph.D., Youngstown State University (http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22816.html)
kmo = function( data ){
library(MASS)
X <- cor(as.matrix(data))
iX <- ginv(X)
S2 <- diag(diag((iX^-1)))
AIS <- S2%*%iX%*%S2 # anti-image covariance matrix
IS <- X+AIS-2*S2 # image covariance matrix
Dai <- sqrt(diag(diag(AIS)))
IR <- ginv(Dai)%*%IS%*%ginv(Dai) # image correlation matrix
AIR <- ginv(Dai)%*%AIS%*%ginv(Dai) # anti-image correlation matrix
a <- apply((AIR - diag(diag(AIR)))^2, 2, sum)
AA <- sum(a)
b <- apply((X - diag(nrow(X)))^2, 2, sum)
BB <- sum(b)
MSA <- b/(b+a) # indiv. measures of sampling adequacy
AIR <- AIR-diag(nrow(AIR))+diag(MSA)
# Examine the anti-image of the correlation matrix. That is the negative of the partial correlations, partialling out all other variables.
kmo <- BB/(AA+BB) # overall KMO statistic
# Reporting the conclusion
if (kmo >= 0.00 && kmo < 0.50){test <- 'The KMO test yields a degree of common variance unacceptable for FA.'}
else if (kmo >= 0.50 && kmo < 0.60){test <- 'The KMO test yields a degree of common variance miserable.'}
else if (kmo >= 0.60 && kmo < 0.70){test <- 'The KMO test yields a degree of common variance mediocre.'}
else if (kmo >= 0.70 && kmo < 0.80){test <- 'The KMO test yields a degree of common variance middling.' }
else if (kmo >= 0.80 && kmo < 0.90){test <- 'The KMO test yields a degree of common variance meritorious.' }
else { test <- 'The KMO test yields a degree of common variance marvelous.' }
ans <- list( overall = kmo,
report = test,
individual = MSA,
AIS = AIS,
AIR = AIR )
return(ans)
}
#To use this function:
kmo(raqData)
## $overall
## [1] 0.5179028
##
## $report
## [1] "The KMO test yields a degree of common variance miserable."
##
## $individual
## Ontime Comfort Food Hostess Boss Airyoung Freqfly
## 0.5574881 0.4604544 0.5940986 0.4705232 0.4816402 0.7302882 0.4660346
## Schedule Mom Lifestyl
## 0.5138174 0.2851663 0.5161732
##
## $AIS
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.080685725 -0.001754032 -0.066737059 0.019404151 -0.04325037
## [2,] -0.001754032 0.106015361 0.026223810 -0.019024735 0.03157028
## [3,] -0.066737059 0.026223810 0.127700180 -0.022134807 0.07957348
## [4,] 0.019404151 -0.019024735 -0.022134807 0.027050858 -0.05130881
## [5,] -0.043250368 0.031570283 0.079573484 -0.051308815 0.53173568
## [6,] -0.047619942 -0.042776739 -0.031123114 0.003520371 -0.07634157
## [7,] -0.008309901 0.008937190 0.008023899 -0.012921104 0.02342413
## [8,] -0.030850555 -0.080341384 0.011160699 -0.009269619 0.04505619
## [9,] 0.107278418 -0.057810992 -0.129112274 0.051193132 -0.07169968
## [10,] 0.010965535 -0.010465889 -0.009331248 0.016701767 -0.03140171
## [,6] [,7] [,8] [,9] [,10]
## [1,] -4.761994e-02 -8.309901e-03 -0.030850555 0.107278418 0.010965535
## [2,] -4.277674e-02 8.937190e-03 -0.080341384 -0.057810992 -0.010465889
## [3,] -3.112311e-02 8.023899e-03 0.011160699 -0.129112274 -0.009331248
## [4,] 3.520371e-03 -1.292110e-02 -0.009269619 0.051193132 0.016701767
## [5,] -7.634157e-02 2.342413e-02 0.045056194 -0.071699678 -0.031401707
## [6,] 1.234105e-01 5.486839e-05 0.042757206 -0.003881902 -0.002183054
## [7,] 5.486839e-05 7.553280e-03 0.006140812 -0.017510153 -0.011443259
## [8,] 4.275721e-02 6.140812e-03 0.102383519 0.001636157 -0.011846440
## [9,] -3.881902e-03 -1.751015e-02 0.001636157 0.573978329 0.018019865
## [10,] -2.183054e-03 -1.144326e-02 -0.011846440 0.018019865 0.019493716
##
## $AIR
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.55748807 -0.01896508 -0.65746548 0.4153419 -0.2088063
## [2,] -0.01896508 0.46045438 0.22538014 -0.3552579 0.1329677
## [3,] -0.65746548 0.22538014 0.59409865 -0.3766079 0.3053688
## [4,] 0.41534192 -0.35525787 -0.37660793 0.4705232 -0.4278126
## [5,] -0.20880628 0.13296771 0.30536884 -0.4278126 0.4816402
## [6,] -0.47721544 -0.37397904 -0.24791983 0.0609287 -0.2980144
## [7,] -0.33661210 0.31582667 0.25835801 -0.9039436 0.3696135
## [8,] -0.33942952 -0.77115182 0.09760693 -0.1761393 0.1931041
## [9,] 0.49850126 -0.23435719 -0.47689638 0.4108407 -0.1297841
## [10,] 0.27649301 -0.23022071 -0.18702371 0.7273184 -0.3084310
## [,6] [,7] [,8] [,9] [,10]
## [1,] -0.477215443 -0.336612104 -0.339429516 0.498501258 0.27649301
## [2,] -0.373979042 0.315826674 -0.771151817 -0.234357185 -0.23022071
## [3,] -0.247919833 0.258358006 0.097606925 -0.476896382 -0.18702371
## [4,] 0.060928700 -0.903943611 -0.176139333 0.410840674 0.72731841
## [5,] -0.298014359 0.369613495 0.193104136 -0.129784119 -0.30843101
## [6,] 0.730288150 0.001797125 0.380380526 -0.014585481 -0.04450831
## [7,] 0.001797125 0.466034647 0.220822327 -0.265934336 -0.94304979
## [8,] 0.380380526 0.220822327 0.513817414 0.006749354 -0.26517069
## [9,] -0.014585481 -0.265934336 0.006749354 0.285166299 0.17035562
## [10,] -0.044508312 -0.943049788 -0.265170694 0.170355616 0.51617323
Overall KMO and individual KMO is more than 0.5 hence we can use it for the analysis
Lets see the determinant of correlation matrix.
det(raqmatrix)
## [1] 2.659787e-06
It is more than 0.00001 Now Extracting Factors. First set number of factors equal to that of variables.
pc1 <- principal(raqData,nfactors=10,rotate="none")
pc1
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 10, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 h2
## Ontime 0.06 0.94 0.23 -0.01 -0.15 -0.01 0.13 0.13 -0.09 -0.01 1
## Comfort 0.41 -0.04 0.88 0.02 0.18 -0.03 -0.12 -0.05 -0.12 0.00 1
## Food 0.03 0.91 0.02 0.23 -0.23 0.21 -0.02 -0.12 0.01 0.00 1
## Hostess 0.94 -0.05 -0.24 0.05 0.02 0.13 -0.16 0.10 0.02 -0.03 1
## Boss 0.04 0.61 -0.30 -0.18 0.71 0.05 0.04 -0.02 0.00 0.00 1
## Airyoung -0.02 0.95 0.12 0.07 -0.02 -0.21 -0.13 0.00 0.09 0.00 1
## Freqfly 0.93 -0.01 -0.34 0.15 -0.05 -0.03 0.02 0.02 -0.01 0.05 1
## Schedule 0.48 -0.16 0.83 -0.01 0.09 0.07 0.12 0.02 0.13 0.00 1
## Mom -0.40 -0.18 0.03 0.87 0.22 0.00 0.01 0.04 0.00 0.00 1
## Lifestyl 0.93 -0.01 -0.23 0.18 -0.02 -0.13 0.13 -0.10 -0.02 -0.03 1
## u2 com
## Ontime -2.2e-16 1.3
## Comfort -1.6e-15 1.6
## Food -2.9e-15 1.4
## Hostess -8.9e-16 1.3
## Boss 2.2e-16 2.5
## Airyoung -2.4e-15 1.2
## Freqfly -4.4e-16 1.3
## Schedule 0.0e+00 1.8
## Mom -2.0e-15 1.7
## Lifestyl -4.4e-16 1.3
##
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
## SS loadings 3.18 3.05 1.84 0.91 0.67 0.13 0.11 0.06 0.05 0
## Proportion Var 0.32 0.30 0.18 0.09 0.07 0.01 0.01 0.01 0.00 0
## Cumulative Var 0.32 0.62 0.81 0.90 0.96 0.98 0.99 0.99 1.00 1
## Proportion Explained 0.32 0.30 0.18 0.09 0.07 0.01 0.01 0.01 0.00 0
## Cumulative Proportion 0.32 0.62 0.81 0.90 0.96 0.98 0.99 0.99 1.00 1
##
## Mean item complexity = 1.5
## Test of the hypothesis that 10 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0
## with the empirical chi square 0 with prob < NA
##
## Fit based upon off diagonal values = 1
SS loading or eigen values suggests that only first three components are sufficient. Also fit based on off diagonal values is greater than 0.96 Scree plot
plot(pc1$values,type="b")
Screeplot suggests 6. Rerunning the analysis with 3 factors
pc2 <- principal(raqData, nfactors = 3, rotate = "none")
pc2
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 3, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 h2 u2 com
## Ontime 0.06 0.94 0.23 0.93 0.067 1.1
## Comfort 0.41 -0.04 0.88 0.93 0.066 1.4
## Food 0.03 0.91 0.02 0.84 0.165 1.0
## Hostess 0.94 -0.05 -0.24 0.94 0.057 1.1
## Boss 0.04 0.61 -0.30 0.46 0.538 1.5
## Airyoung -0.02 0.95 0.12 0.92 0.078 1.0
## Freqfly 0.93 -0.01 -0.34 0.97 0.029 1.3
## Schedule 0.48 -0.16 0.83 0.95 0.045 1.7
## Mom -0.40 -0.18 0.03 0.19 0.807 1.4
## Lifestyl 0.93 -0.01 -0.23 0.92 0.077 1.1
##
## PC1 PC2 PC3
## SS loadings 3.18 3.05 1.84
## Proportion Var 0.32 0.30 0.18
## Cumulative Var 0.32 0.62 0.81
## Proportion Explained 0.39 0.38 0.23
## Cumulative Proportion 0.39 0.77 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.06
## with the empirical chi square 6.16 with prob < 1
##
## Fit based upon off diagonal values = 0.98
Lets see if the factors are correct, we find the corr matrix from pc2
loadings <- factor.model(pc2$loadings)
communality <- diag(loadings)
communality
## Ontime Comfort Food Hostess Boss Airyoung Freqfly
## 0.9327081 0.9344951 0.8351882 0.9434903 0.4622671 0.9223321 0.9711102
## Schedule Mom Lifestyl
## 0.9547383 0.1925121 0.9232052
The diagonals of this matrix contains the communalities after extraction. Lets see the difference it is called residual
residuals <- factor.residuals(raqmatrix,pc2$loadings)
uniqueness <- diag(residuals)
uniqueness
## Ontime Comfort Food Hostess Boss Airyoung
## 0.06729189 0.06550491 0.16481183 0.05650968 0.53773288 0.07766786
## Freqfly Schedule Mom Lifestyl
## 0.02888985 0.04526174 0.80748791 0.07679481
The diagonal of this matrix is the uniqueness.
residuals<-as.matrix(residuals[upper.tri(residuals)])
This command re-creates the object residuals by using only the upper triangle of the original matrix. We now have an object called residuals that contains the residuals stored in a column. This is handy because it makes it easy to calculate various things.
large.resid<-abs(residuals) > 0.05
# proportion of the large residuals
sum(large.resid)/nrow(residuals)
## [1] 0.2666667
Some other residuals stats, such as the mean, are skipped here.
We can set rotate=“varimax” in the principal() function. But there are too many things to see.
print.psych() command prints the factor loading matrix associated with the model pc3, but displaying only loadings above .3 (cut = 0.3) and sorting items by the size of their loadings (sort = TRUE).
pc3 <- principal(raqData, nfactors=3, rotate="varimax")
pc3
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 3, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC2 RC1 RC3 h2 u2 com
## Ontime 0.95 0.00 0.15 0.93 0.067 1.1
## Comfort 0.04 0.09 0.96 0.93 0.066 1.0
## Food 0.91 0.04 -0.05 0.84 0.165 1.0
## Hostess -0.06 0.96 0.10 0.94 0.057 1.0
## Boss 0.58 0.15 -0.33 0.46 0.538 1.7
## Airyoung 0.96 -0.04 0.02 0.92 0.078 1.0
## Freqfly -0.03 0.99 -0.01 0.97 0.029 1.0
## Schedule -0.08 0.18 0.96 0.95 0.045 1.1
## Mom -0.18 -0.39 -0.09 0.19 0.807 1.5
## Lifestyl -0.02 0.96 0.10 0.92 0.077 1.0
##
## RC2 RC1 RC3
## SS loadings 3.04 3.03 2.00
## Proportion Var 0.30 0.30 0.20
## Cumulative Var 0.30 0.61 0.81
## Proportion Explained 0.38 0.38 0.25
## Cumulative Proportion 0.38 0.75 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.06
## with the empirical chi square 6.16 with prob < 1
##
## Fit based upon off diagonal values = 0.98
print.psych(pc3, cut = 0.3)
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 3, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC2 RC1 RC3 h2 u2 com
## Ontime 0.95 0.93 0.067 1.1
## Comfort 0.96 0.93 0.066 1.0
## Food 0.91 0.84 0.165 1.0
## Hostess 0.96 0.94 0.057 1.0
## Boss 0.58 -0.33 0.46 0.538 1.7
## Airyoung 0.96 0.92 0.078 1.0
## Freqfly 0.99 0.97 0.029 1.0
## Schedule 0.96 0.95 0.045 1.1
## Mom -0.39 0.19 0.807 1.5
## Lifestyl 0.96 0.92 0.077 1.0
##
## RC2 RC1 RC3
## SS loadings 3.04 3.03 2.00
## Proportion Var 0.30 0.30 0.20
## Cumulative Var 0.30 0.61 0.81
## Proportion Explained 0.38 0.38 0.25
## Cumulative Proportion 0.38 0.75 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 3 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.06
## with the empirical chi square 6.16 with prob < 1
##
## Fit based upon off diagonal values = 0.98
The three factors extracted together account of 80% of the variance. Factor 1 is a combination of Online, Food and Airyoung. We can calssify it as “Customer Service”
Factor 2 is a combination of Hostess, frqufly and Lifestyle. This can by classified as “Flyer incentive”.
Factor 3 is a combination of “Comfort and schedule”. We can classify it as “Convenience”
head(pc2$scores) # access scores by pc2$scores
## PC1 PC2 PC3
## [1,] -0.32868386 -0.8521834 0.1589147
## [2,] -0.76859062 0.1638226 -0.6895576
## [3,] 0.18285990 -0.4793819 -0.1301905
## [4,] 0.20381401 0.4823934 1.7012390
## [5,] 0.05296912 1.4490622 0.3696490
## [6,] -0.31880402 2.4967905 0.8699729
raqData1 <- cbind(raqData[,1:10], pc2$scores)
# bind the factor scores to raqData dataframe for other use
head(raqData1)
## Ontime Comfort Food Hostess Boss Airyoung Freqfly Schedule Mom Lifestyl
## 1 1 2 2 3 1 1 2 2 1 2
## 2 2 1 2 2 5 2 2 1 2 2
## 3 1 3 1 4 6 2 3 2 5 3
## 4 3 4 2 2 4 3 2 4 1 3
## 5 4 2 4 3 2 4 3 2 3 3
## 6 5 3 4 2 6 5 2 2 1 2
## PC1 PC2 PC3
## 1 -0.32868386 -0.8521834 0.1589147
## 2 -0.76859062 0.1638226 -0.6895576
## 3 0.18285990 -0.4793819 -0.1301905
## 4 0.20381401 0.4823934 1.7012390
## 5 0.05296912 1.4490622 0.3696490
## 6 -0.31880402 2.4967905 0.8699729
biplot.psych(pc2)