Case

Priave domestic airline operators raised their airfare in Dec 2000. Indian Airlines, the government owned carrier, was the only domestic airline which did not follow suit. Information available showd that people still prefer to fly Jet Airways. Our group set out to ascertain the reasons for the above preference, as according to Indian Airlines officials domestic flyers are price-conscious customers.

Our study consisted of 20 respondents who had recently flown with Jet Airways. they were asked to indicate on a seven point scale( 1= completely agree, 7= completely disagree), their agreement or disagreement with the set of 10 statemnets relating to their perceptions and attributes of the airlines.

The 10 statements were as follows:

  1. They ( Jet Airways) are always on time
  2. The seats are very comfortable.
  3. I love the food they provide.
  4. Their air-hostesses are very beautiful.
  5. My boss/friend flies with the same airline.
  6. The airlines have younder aircrafts.
  7. I get the advantage of a frequent flyer program.
  8. It ( the flight timing) suits my schedule.
  9. My mom feels salfe when I fly Jet.
  10. Flying Jet compliments my lifestyle and social standing in the society.

Reading Data

raqData <- read.csv("Airlines.csv",header = TRUE)

Calculate the correlation matrix

raqmatrix <- round(cor(raqData),2) 
raqmatrix
##          Ontime Comfort  Food Hostess  Boss Airyoung Freqfly Schedule
## Ontime     1.00    0.14  0.87   -0.06  0.40     0.90   -0.02     0.07
## Comfort    0.14    1.00 -0.05    0.19 -0.16     0.06    0.08     0.92
## Food       0.87   -0.05  1.00    0.00  0.36     0.85    0.05    -0.13
## Hostess   -0.06    0.19  0.00    1.00  0.08    -0.10    0.95     0.26
## Boss       0.40   -0.16  0.36    0.08  1.00     0.50    0.07    -0.25
## Airyoung   0.90    0.06  0.85   -0.10  0.50     1.00   -0.05    -0.08
## Freqfly   -0.02    0.08  0.05    0.95  0.07    -0.05    1.00     0.16
## Schedule   0.07    0.92 -0.13    0.26 -0.25    -0.08    0.16     1.00
## Mom       -0.22   -0.07 -0.03   -0.32 -0.14    -0.11   -0.26    -0.13
## Lifestyl   0.01    0.17  0.04    0.89  0.05    -0.03    0.97     0.26
##            Mom Lifestyl
## Ontime   -0.22     0.01
## Comfort  -0.07     0.17
## Food     -0.03     0.04
## Hostess  -0.32     0.89
## Boss     -0.14     0.05
## Airyoung -0.11    -0.03
## Freqfly  -0.26     0.97
## Schedule -0.13     0.26
## Mom       1.00    -0.23
## Lifestyl -0.23     1.00

Run Barlette Test using psych package

library(psych)
## Warning: package 'psych' was built under R version 3.6.1
cortest.bartlett(raqData)
## R was not square, finding R from data
## $chisq
## [1] 194.4825
## 
## $p.value
## [1] 1.727031e-20
## 
## $df
## [1] 45

The test data is highly significant and therefore factor analysis is appropriate

KMO Test

# KMO Kaiser-Meyer-Olkin Measure of Sampling Adequacy
# Function by G. Jay Kerns, Ph.D., Youngstown State University (http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22816.html)

kmo = function( data ){
  library(MASS) 
  X <- cor(as.matrix(data)) 
  iX <- ginv(X) 
  S2 <- diag(diag((iX^-1)))
  AIS <- S2%*%iX%*%S2                      # anti-image covariance matrix
  IS <- X+AIS-2*S2                         # image covariance matrix
  Dai <- sqrt(diag(diag(AIS)))
  IR <- ginv(Dai)%*%IS%*%ginv(Dai)         # image correlation matrix
  AIR <- ginv(Dai)%*%AIS%*%ginv(Dai)       # anti-image correlation matrix
  a <- apply((AIR - diag(diag(AIR)))^2, 2, sum)
  AA <- sum(a) 
  b <- apply((X - diag(nrow(X)))^2, 2, sum)
  BB <- sum(b)
  MSA <- b/(b+a)                        # indiv. measures of sampling adequacy
  AIR <- AIR-diag(nrow(AIR))+diag(MSA)  
  # Examine the anti-image of the correlation matrix. That is the  negative of the partial correlations, partialling out all other variables.
  kmo <- BB/(AA+BB)                     # overall KMO statistic
  # Reporting the conclusion 
   if (kmo >= 0.00 && kmo < 0.50){test <- 'The KMO test yields a degree of common variance unacceptable for FA.'} 
      else if (kmo >= 0.50 && kmo < 0.60){test <- 'The KMO test yields a degree of common variance miserable.'} 
      else if (kmo >= 0.60 && kmo < 0.70){test <- 'The KMO test yields a degree of common variance mediocre.'} 
      else if (kmo >= 0.70 && kmo < 0.80){test <- 'The KMO test yields a degree of common variance middling.' } 
      else if (kmo >= 0.80 && kmo < 0.90){test <- 'The KMO test yields a degree of common variance meritorious.' }
       else { test <- 'The KMO test yields a degree of common variance marvelous.' }

       ans <- list( overall = kmo,
                  report = test,
                  individual = MSA,
                  AIS = AIS,
                  AIR = AIR )
    return(ans)
} 

#To use this function:
kmo(raqData)
## $overall
## [1] 0.5179028
## 
## $report
## [1] "The KMO test yields a degree of common variance miserable."
## 
## $individual
##    Ontime   Comfort      Food   Hostess      Boss  Airyoung   Freqfly 
## 0.5574881 0.4604544 0.5940986 0.4705232 0.4816402 0.7302882 0.4660346 
##  Schedule       Mom  Lifestyl 
## 0.5138174 0.2851663 0.5161732 
## 
## $AIS
##               [,1]         [,2]         [,3]         [,4]        [,5]
##  [1,]  0.080685725 -0.001754032 -0.066737059  0.019404151 -0.04325037
##  [2,] -0.001754032  0.106015361  0.026223810 -0.019024735  0.03157028
##  [3,] -0.066737059  0.026223810  0.127700180 -0.022134807  0.07957348
##  [4,]  0.019404151 -0.019024735 -0.022134807  0.027050858 -0.05130881
##  [5,] -0.043250368  0.031570283  0.079573484 -0.051308815  0.53173568
##  [6,] -0.047619942 -0.042776739 -0.031123114  0.003520371 -0.07634157
##  [7,] -0.008309901  0.008937190  0.008023899 -0.012921104  0.02342413
##  [8,] -0.030850555 -0.080341384  0.011160699 -0.009269619  0.04505619
##  [9,]  0.107278418 -0.057810992 -0.129112274  0.051193132 -0.07169968
## [10,]  0.010965535 -0.010465889 -0.009331248  0.016701767 -0.03140171
##                [,6]          [,7]         [,8]         [,9]        [,10]
##  [1,] -4.761994e-02 -8.309901e-03 -0.030850555  0.107278418  0.010965535
##  [2,] -4.277674e-02  8.937190e-03 -0.080341384 -0.057810992 -0.010465889
##  [3,] -3.112311e-02  8.023899e-03  0.011160699 -0.129112274 -0.009331248
##  [4,]  3.520371e-03 -1.292110e-02 -0.009269619  0.051193132  0.016701767
##  [5,] -7.634157e-02  2.342413e-02  0.045056194 -0.071699678 -0.031401707
##  [6,]  1.234105e-01  5.486839e-05  0.042757206 -0.003881902 -0.002183054
##  [7,]  5.486839e-05  7.553280e-03  0.006140812 -0.017510153 -0.011443259
##  [8,]  4.275721e-02  6.140812e-03  0.102383519  0.001636157 -0.011846440
##  [9,] -3.881902e-03 -1.751015e-02  0.001636157  0.573978329  0.018019865
## [10,] -2.183054e-03 -1.144326e-02 -0.011846440  0.018019865  0.019493716
## 
## $AIR
##              [,1]        [,2]        [,3]       [,4]       [,5]
##  [1,]  0.55748807 -0.01896508 -0.65746548  0.4153419 -0.2088063
##  [2,] -0.01896508  0.46045438  0.22538014 -0.3552579  0.1329677
##  [3,] -0.65746548  0.22538014  0.59409865 -0.3766079  0.3053688
##  [4,]  0.41534192 -0.35525787 -0.37660793  0.4705232 -0.4278126
##  [5,] -0.20880628  0.13296771  0.30536884 -0.4278126  0.4816402
##  [6,] -0.47721544 -0.37397904 -0.24791983  0.0609287 -0.2980144
##  [7,] -0.33661210  0.31582667  0.25835801 -0.9039436  0.3696135
##  [8,] -0.33942952 -0.77115182  0.09760693 -0.1761393  0.1931041
##  [9,]  0.49850126 -0.23435719 -0.47689638  0.4108407 -0.1297841
## [10,]  0.27649301 -0.23022071 -0.18702371  0.7273184 -0.3084310
##               [,6]         [,7]         [,8]         [,9]       [,10]
##  [1,] -0.477215443 -0.336612104 -0.339429516  0.498501258  0.27649301
##  [2,] -0.373979042  0.315826674 -0.771151817 -0.234357185 -0.23022071
##  [3,] -0.247919833  0.258358006  0.097606925 -0.476896382 -0.18702371
##  [4,]  0.060928700 -0.903943611 -0.176139333  0.410840674  0.72731841
##  [5,] -0.298014359  0.369613495  0.193104136 -0.129784119 -0.30843101
##  [6,]  0.730288150  0.001797125  0.380380526 -0.014585481 -0.04450831
##  [7,]  0.001797125  0.466034647  0.220822327 -0.265934336 -0.94304979
##  [8,]  0.380380526  0.220822327  0.513817414  0.006749354 -0.26517069
##  [9,] -0.014585481 -0.265934336  0.006749354  0.285166299  0.17035562
## [10,] -0.044508312 -0.943049788 -0.265170694  0.170355616  0.51617323

Overall KMO and individual KMO is more than 0.5 hence we can use it for the analysis

Lets see the determinant of correlation matrix.

det(raqmatrix)
## [1] 2.659787e-06

It is more than 0.00001 Now Extracting Factors. First set number of factors equal to that of variables.

pc1 <- principal(raqData,nfactors=10,rotate="none")
pc1
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 10, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8   PC9  PC10 h2
## Ontime    0.06  0.94  0.23 -0.01 -0.15 -0.01  0.13  0.13 -0.09 -0.01  1
## Comfort   0.41 -0.04  0.88  0.02  0.18 -0.03 -0.12 -0.05 -0.12  0.00  1
## Food      0.03  0.91  0.02  0.23 -0.23  0.21 -0.02 -0.12  0.01  0.00  1
## Hostess   0.94 -0.05 -0.24  0.05  0.02  0.13 -0.16  0.10  0.02 -0.03  1
## Boss      0.04  0.61 -0.30 -0.18  0.71  0.05  0.04 -0.02  0.00  0.00  1
## Airyoung -0.02  0.95  0.12  0.07 -0.02 -0.21 -0.13  0.00  0.09  0.00  1
## Freqfly   0.93 -0.01 -0.34  0.15 -0.05 -0.03  0.02  0.02 -0.01  0.05  1
## Schedule  0.48 -0.16  0.83 -0.01  0.09  0.07  0.12  0.02  0.13  0.00  1
## Mom      -0.40 -0.18  0.03  0.87  0.22  0.00  0.01  0.04  0.00  0.00  1
## Lifestyl  0.93 -0.01 -0.23  0.18 -0.02 -0.13  0.13 -0.10 -0.02 -0.03  1
##                u2 com
## Ontime   -2.2e-16 1.3
## Comfort  -1.6e-15 1.6
## Food     -2.9e-15 1.4
## Hostess  -8.9e-16 1.3
## Boss      2.2e-16 2.5
## Airyoung -2.4e-15 1.2
## Freqfly  -4.4e-16 1.3
## Schedule  0.0e+00 1.8
## Mom      -2.0e-15 1.7
## Lifestyl -4.4e-16 1.3
## 
##                        PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10
## SS loadings           3.18 3.05 1.84 0.91 0.67 0.13 0.11 0.06 0.05    0
## Proportion Var        0.32 0.30 0.18 0.09 0.07 0.01 0.01 0.01 0.00    0
## Cumulative Var        0.32 0.62 0.81 0.90 0.96 0.98 0.99 0.99 1.00    1
## Proportion Explained  0.32 0.30 0.18 0.09 0.07 0.01 0.01 0.01 0.00    0
## Cumulative Proportion 0.32 0.62 0.81 0.90 0.96 0.98 0.99 0.99 1.00    1
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 10 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0 
##  with the empirical chi square  0  with prob <  NA 
## 
## Fit based upon off diagonal values = 1

SS loading or eigen values suggests that only first three components are sufficient. Also fit based on off diagonal values is greater than 0.96 Scree plot

plot(pc1$values,type="b")

Screeplot suggests 6. Rerunning the analysis with 3 factors

pc2 <-  principal(raqData, nfactors = 3, rotate = "none")
pc2
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 3, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            PC1   PC2   PC3   h2    u2 com
## Ontime    0.06  0.94  0.23 0.93 0.067 1.1
## Comfort   0.41 -0.04  0.88 0.93 0.066 1.4
## Food      0.03  0.91  0.02 0.84 0.165 1.0
## Hostess   0.94 -0.05 -0.24 0.94 0.057 1.1
## Boss      0.04  0.61 -0.30 0.46 0.538 1.5
## Airyoung -0.02  0.95  0.12 0.92 0.078 1.0
## Freqfly   0.93 -0.01 -0.34 0.97 0.029 1.3
## Schedule  0.48 -0.16  0.83 0.95 0.045 1.7
## Mom      -0.40 -0.18  0.03 0.19 0.807 1.4
## Lifestyl  0.93 -0.01 -0.23 0.92 0.077 1.1
## 
##                        PC1  PC2  PC3
## SS loadings           3.18 3.05 1.84
## Proportion Var        0.32 0.30 0.18
## Cumulative Var        0.32 0.62 0.81
## Proportion Explained  0.39 0.38 0.23
## Cumulative Proportion 0.39 0.77 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 3 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.06 
##  with the empirical chi square  6.16  with prob <  1 
## 
## Fit based upon off diagonal values = 0.98

Lets see if the factors are correct, we find the corr matrix from pc2

loadings <- factor.model(pc2$loadings)
communality <- diag(loadings)
communality
##    Ontime   Comfort      Food   Hostess      Boss  Airyoung   Freqfly 
## 0.9327081 0.9344951 0.8351882 0.9434903 0.4622671 0.9223321 0.9711102 
##  Schedule       Mom  Lifestyl 
## 0.9547383 0.1925121 0.9232052

The diagonals of this matrix contains the communalities after extraction. Lets see the difference it is called residual

residuals <- factor.residuals(raqmatrix,pc2$loadings)
uniqueness <- diag(residuals)
uniqueness
##     Ontime    Comfort       Food    Hostess       Boss   Airyoung 
## 0.06729189 0.06550491 0.16481183 0.05650968 0.53773288 0.07766786 
##    Freqfly   Schedule        Mom   Lifestyl 
## 0.02888985 0.04526174 0.80748791 0.07679481

The diagonal of this matrix is the uniqueness.

residuals<-as.matrix(residuals[upper.tri(residuals)])

This command re-creates the object residuals by using only the upper triangle of the original matrix. We now have an object called residuals that contains the residuals stored in a column. This is handy because it makes it easy to calculate various things.

large.resid<-abs(residuals) > 0.05
# proportion of the large residuals
sum(large.resid)/nrow(residuals)
## [1] 0.2666667

Some other residuals stats, such as the mean, are skipped here.

Rotation

Orthogonal Rotation

We can set rotate=“varimax” in the principal() function. But there are too many things to see.

print.psych() command prints the factor loading matrix associated with the model pc3, but displaying only loadings above .3 (cut = 0.3) and sorting items by the size of their loadings (sort = TRUE).

pc3 <- principal(raqData, nfactors=3, rotate="varimax")
pc3
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 3, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            RC2   RC1   RC3   h2    u2 com
## Ontime    0.95  0.00  0.15 0.93 0.067 1.1
## Comfort   0.04  0.09  0.96 0.93 0.066 1.0
## Food      0.91  0.04 -0.05 0.84 0.165 1.0
## Hostess  -0.06  0.96  0.10 0.94 0.057 1.0
## Boss      0.58  0.15 -0.33 0.46 0.538 1.7
## Airyoung  0.96 -0.04  0.02 0.92 0.078 1.0
## Freqfly  -0.03  0.99 -0.01 0.97 0.029 1.0
## Schedule -0.08  0.18  0.96 0.95 0.045 1.1
## Mom      -0.18 -0.39 -0.09 0.19 0.807 1.5
## Lifestyl -0.02  0.96  0.10 0.92 0.077 1.0
## 
##                        RC2  RC1  RC3
## SS loadings           3.04 3.03 2.00
## Proportion Var        0.30 0.30 0.20
## Cumulative Var        0.30 0.61 0.81
## Proportion Explained  0.38 0.38 0.25
## Cumulative Proportion 0.38 0.75 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 3 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.06 
##  with the empirical chi square  6.16  with prob <  1 
## 
## Fit based upon off diagonal values = 0.98
print.psych(pc3, cut = 0.3)
## Principal Components Analysis
## Call: principal(r = raqData, nfactors = 3, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##            RC2   RC1   RC3   h2    u2 com
## Ontime    0.95             0.93 0.067 1.1
## Comfort               0.96 0.93 0.066 1.0
## Food      0.91             0.84 0.165 1.0
## Hostess         0.96       0.94 0.057 1.0
## Boss      0.58       -0.33 0.46 0.538 1.7
## Airyoung  0.96             0.92 0.078 1.0
## Freqfly         0.99       0.97 0.029 1.0
## Schedule              0.96 0.95 0.045 1.1
## Mom            -0.39       0.19 0.807 1.5
## Lifestyl        0.96       0.92 0.077 1.0
## 
##                        RC2  RC1  RC3
## SS loadings           3.04 3.03 2.00
## Proportion Var        0.30 0.30 0.20
## Cumulative Var        0.30 0.61 0.81
## Proportion Explained  0.38 0.38 0.25
## Cumulative Proportion 0.38 0.75 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 3 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.06 
##  with the empirical chi square  6.16  with prob <  1 
## 
## Fit based upon off diagonal values = 0.98

The three factors extracted together account of 80% of the variance. Factor 1 is a combination of Online, Food and Airyoung. We can calssify it as “Customer Service”

Factor 2 is a combination of Hostess, frqufly and Lifestyle. This can by classified as “Flyer incentive”.

Factor 3 is a combination of “Comfort and schedule”. We can classify it as “Convenience”

head(pc2$scores)    # access scores by pc2$scores
##              PC1        PC2        PC3
## [1,] -0.32868386 -0.8521834  0.1589147
## [2,] -0.76859062  0.1638226 -0.6895576
## [3,]  0.18285990 -0.4793819 -0.1301905
## [4,]  0.20381401  0.4823934  1.7012390
## [5,]  0.05296912  1.4490622  0.3696490
## [6,] -0.31880402  2.4967905  0.8699729
raqData1 <- cbind(raqData[,1:10], pc2$scores)
# bind the factor scores to raqData dataframe for other use
head(raqData1)
##   Ontime Comfort Food Hostess Boss Airyoung Freqfly Schedule Mom Lifestyl
## 1      1       2    2       3    1        1       2        2   1        2
## 2      2       1    2       2    5        2       2        1   2        2
## 3      1       3    1       4    6        2       3        2   5        3
## 4      3       4    2       2    4        3       2        4   1        3
## 5      4       2    4       3    2        4       3        2   3        3
## 6      5       3    4       2    6        5       2        2   1        2
##           PC1        PC2        PC3
## 1 -0.32868386 -0.8521834  0.1589147
## 2 -0.76859062  0.1638226 -0.6895576
## 3  0.18285990 -0.4793819 -0.1301905
## 4  0.20381401  0.4823934  1.7012390
## 5  0.05296912  1.4490622  0.3696490
## 6 -0.31880402  2.4967905  0.8699729
biplot.psych(pc2)