Loading libraries

library(readr)
library(corrplot)
package 㤼㸱corrplot㤼㸲 was built under R version 3.4.2corrplot 0.84 loaded
library(factoextra)
package 㤼㸱factoextra㤼㸲 was built under R version 3.4.2Loading required package: ggplot2
Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
library(GPArotation)

Loading data

cereal <- read_csv("D:/PG Business Analytics/AS/Group Assignment/cereal.csv")
Parsed with column specification:
cols(
  .default = col_integer(),
  Cereals = col_character()
)
See spec(...) for full column specifications.

What do we have here

summary(cereal)
   Cereals             Filling         Natural          Fibre           Sweet            Easy            Salt         Satisfying   
 Length:235         Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2.000  
 Class :character   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:4.000   1st Qu.:1.000   1st Qu.:3.000  
 Mode  :character   Median :4.000   Median :4.000   Median :4.000   Median :2.000   Median :5.000   Median :2.000   Median :4.000  
                    Mean   :3.881   Mean   :3.783   Mean   :3.528   Mean   :2.506   Mean   :4.532   Mean   :1.991   Mean   :4.004  
                    3rd Qu.:4.500   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:5.000  
                    Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :6.000   Max.   :4.000   Max.   :6.000  
     Energy           Fun             Kids           Soggy         Economical        Health          Family         Calories    
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:2.000   1st Qu.:3.000   1st Qu.:1.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000  
 Median :4.000   Median :2.000   Median :4.000   Median :2.000   Median :3.000   Median :4.000   Median :4.000   Median :3.000  
 Mean   :3.643   Mean   :2.617   Mean   :3.843   Mean   :2.255   Mean   :3.217   Mean   :3.809   Mean   :3.877   Mean   :2.702  
 3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:3.000  
 Max.   :5.000   Max.   :5.000   Max.   :6.000   Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :6.000   Max.   :5.000  
     Plain           Crisp          Regular          Sugar           Fruit          Process         Quality          Treat          Boring    
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.00   Min.   :1.00  
 1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:3.000   1st Qu.:2.00   1st Qu.:1.00  
 Median :2.000   Median :3.000   Median :3.000   Median :2.000   Median :1.000   Median :3.000   Median :4.000   Median :3.00   Median :2.00  
 Mean   :2.268   Mean   :3.204   Mean   :3.072   Mean   :2.145   Mean   :1.694   Mean   :2.936   Mean   :3.694   Mean   :2.63   Mean   :1.83  
 3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.00   3rd Qu.:2.00  
 Max.   :5.000   Max.   :6.000   Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :6.000   Max.   :5.000   Max.   :6.00   Max.   :5.00  
   Nutritious   
 Min.   :1.000  
 1st Qu.:3.000  
 Median :4.000  
 Mean   :3.664  
 3rd Qu.:4.000  
 Max.   :5.000  
plot(hclust(dist(as.matrix(cereal[2:26])),method = "ward.D"), main ="Segment Valdation")

boxplot(cereal[2:26])

ok some have 1-5 while others have 1-6 range. Strange

counts<-as.data.frame(table(cereal[1]))
counts

Since the number of correspondants are different for all cerels we must scale all varialbes. Scaling using mean

data<-aggregate(.~Cereals,cereal, FUN=sum)
data<-cbind(data,counts[2])
data

This is all the responces added up, now dividing all by Freq.

for ( i in c(1:12)) {
  data[i,2:26]<-data[i,2:26]/data[i,27]
}
data
boxplot(data[2:26])

cerealMat <- cor(data[2:26])
corrplot(cerealMat, method = "circle")

pca<-PCA(data[,2:26],scale.unit=T,ncp=dim(data)[2],graph = F)
summary(pca)

Call:
PCA(X = data[, 2:26], scale.unit = T, ncp = dim(data)[2], graph = F) 


Eigenvalues
                       Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7   Dim.8   Dim.9  Dim.10  Dim.11
Variance               9.488   7.936   2.485   1.652   1.246   0.769   0.487   0.454   0.196   0.181   0.105
% of var.             37.953  31.743   9.941   6.608   4.985   3.077   1.947   1.816   0.786   0.726   0.418
Cumulative % of var.  37.953  69.696  79.637  86.245  91.230  94.307  96.254  98.070  98.856  99.582 100.000

Individuals (the 10 first)
               Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr   cos2  
1          |  5.264 |  0.275  0.066  0.003 | -3.784 15.037  0.517 | -2.711 24.643  0.265 |
2          |  4.390 |  3.291  9.512  0.562 |  2.040  4.369  0.216 |  0.964  3.113  0.048 |
3          |  4.266 | -2.965  7.722  0.483 |  2.067  4.486  0.235 |  0.316  0.334  0.005 |
4          |  3.281 |  0.186  0.030  0.003 |  1.285  1.735  0.153 | -2.211 16.398  0.454 |
5          |  4.932 |  3.659 11.758  0.550 | -0.135  0.019  0.001 | -2.152 15.529  0.190 |
6          |  5.567 | -1.983  3.454  0.127 |  4.545 21.688  0.666 | -0.608  1.238  0.012 |
7          |  4.787 |  3.879 13.218  0.657 |  1.912  3.840  0.160 |  0.661  1.464  0.019 |
8          |  6.739 | -6.019 31.818  0.798 |  1.293  1.757  0.037 |  0.645  1.397  0.009 |
9          |  3.524 | -2.492  5.452  0.500 |  0.052  0.003  0.000 |  0.506  0.858  0.021 |
10         |  5.407 |  4.162 15.217  0.593 |  0.188  0.037  0.001 |  2.449 20.102  0.205 |

Variables (the 10 first)
              Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr   cos2  
Filling    |  0.810  6.915  0.656 | -0.178  0.398  0.032 |  0.244  2.393  0.059 |
Natural    |  0.808  6.886  0.653 | -0.444  2.486  0.197 |  0.149  0.898  0.022 |
Fibre      |  0.767  6.199  0.588 | -0.542  3.704  0.294 | -0.128  0.663  0.016 |
Sweet      |  0.498  2.615  0.248 |  0.799  8.048  0.639 | -0.105  0.446  0.011 |
Easy       | -0.055  0.032  0.003 |  0.213  0.573  0.045 |  0.706 20.027  0.498 |
Salt       |  0.029  0.009  0.001 |  0.770  7.477  0.593 | -0.273  3.000  0.075 |
Satisfying |  0.755  6.007  0.570 |  0.154  0.297  0.024 |  0.485  9.446  0.235 |
Energy     |  0.812  6.947  0.659 |  0.114  0.164  0.013 |  0.406  6.636  0.165 |
Fun        |  0.220  0.510  0.048 |  0.876  9.665  0.767 |  0.260  2.715  0.067 |
Kids       | -0.704  5.220  0.495 |  0.312  1.228  0.097 |  0.493  9.770  0.243 |
fviz_pca_biplot(pca11, repel = TRUE,
                col.var = "#2E9FDF", # Variables color
                col.ind = "#696969"  # Individuals color
)

cerealKMO <- KMO(cereal[,-1])
cerealKMO
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cereal[, -1])
Overall MSA =  0.85
MSA for each item = 
   Filling    Natural      Fibre      Sweet       Easy       Salt Satisfying     Energy        Fun       Kids      Soggy Economical 
      0.89       0.90       0.88       0.78       0.83       0.82       0.91       0.91       0.85       0.68       0.63       0.73 
    Health     Family   Calories      Plain      Crisp    Regular      Sugar      Fruit    Process    Quality      Treat     Boring 
      0.92       0.73       0.86       0.82       0.83       0.87       0.78       0.77       0.79       0.91       0.88       0.88 
Nutritious 
      0.92 
cereal_log<-cereal
for ( i in c(2:26)) {
  cereal_log[,i]<-log10(cereal_log[,i])
}
factor <- fa(cereal_log[,2:26],nfactors = 4,rotate = "oblimin",fm="minres")
print(factor)
Factor Analysis using method =  minres
Call: fa(r = cereal_log[, 2:26], nfactors = 4, rotate = "oblimin", 
    fm = "minres")
Standardized loadings (pattern matrix) based upon correlation matrix
             MR1   MR4   MR2   MR3   h2   u2 com
Filling     0.70  0.06  0.18  0.16 0.56 0.44 1.3
Natural     0.80 -0.02 -0.12 -0.01 0.67 0.33 1.0
Fibre       0.87 -0.08  0.03 -0.16 0.72 0.28 1.1
Sweet       0.02  0.28  0.65  0.03 0.60 0.40 1.4
Easy        0.24  0.04  0.10  0.24 0.14 0.86 2.4
Salt        0.01 -0.23  0.69  0.03 0.45 0.55 1.2
Satisfying  0.60  0.13  0.14  0.35 0.59 0.41 1.8
Energy      0.63  0.11  0.15  0.15 0.49 0.51 1.3
Fun         0.02  0.52  0.08  0.26 0.42 0.58 1.5
Kids       -0.03 -0.03  0.04  0.85 0.71 0.29 1.0
Soggy       0.14 -0.54  0.09  0.09 0.24 0.76 1.2
Economical  0.01 -0.21 -0.28  0.39 0.27 0.73 2.4
Health      0.84 -0.02 -0.17 -0.03 0.77 0.23 1.1
Family      0.02  0.10 -0.10  0.75 0.61 0.39 1.1
Calories   -0.09 -0.02  0.60  0.00 0.37 0.63 1.0
Plain      -0.02 -0.70  0.00  0.15 0.49 0.51 1.1
Crisp      -0.06  0.43  0.06  0.21 0.26 0.74 1.5
Regular     0.63  0.06 -0.02 -0.07 0.43 0.57 1.0
Sugar      -0.13  0.06  0.76 -0.09 0.66 0.34 1.1
Fruit       0.27  0.44  0.18 -0.36 0.45 0.55 3.0
Process    -0.09 -0.19  0.35 -0.02 0.16 0.84 1.7
Quality     0.59  0.16 -0.19  0.09 0.51 0.49 1.4
Treat       0.08  0.60  0.13  0.16 0.51 0.49 1.3
Boring     -0.04 -0.58  0.12 -0.12 0.37 0.63 1.2
Nutritious  0.85 -0.04 -0.04 -0.03 0.71 0.29 1.0

                       MR1  MR4  MR2  MR3
SS loadings           5.14 2.59 2.37 2.06
Proportion Var        0.21 0.10 0.09 0.08
Cumulative Var        0.21 0.31 0.40 0.49
Proportion Explained  0.42 0.21 0.20 0.17
Cumulative Proportion 0.42 0.64 0.83 1.00

 With factor correlations of 
      MR1  MR4   MR2  MR3
MR1  1.00 0.31 -0.17 0.09
MR4  0.31 1.00  0.26 0.15
MR2 -0.17 0.26  1.00 0.01
MR3  0.09 0.15  0.01 1.00

Mean item complexity =  1.4
Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the null model are  300  and the objective function was  12.15 with Chi Square of  2731.76
The degrees of freedom for the model are 206  and the objective function was  1.81 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.05 

The harmonic number of observations is  235 with the empirical chi square  219.24  with prob <  0.25 
The total number of observations was  235  with Likelihood Chi Square =  402.22  with prob <  8.5e-15 

Tucker Lewis Index of factoring reliability =  0.881
RMSEA index =  0.067  and the 90 % confidence intervals are  0.054 0.073
BIC =  -722.45
Fit based upon off diagonal values = 0.98
Measures of factor score adequacy             
                                                MR1  MR4  MR2  MR3
Correlation of scores with factors             0.97 0.90 0.92 0.92
Multiple R square of scores with factors       0.94 0.82 0.84 0.84
Minimum correlation of possible factor scores  0.88 0.64 0.68 0.68
fa.diagram(factor,main="Structure Diagram")

fa <- factanal(cereal[2:26], factor=4, rotation="varimax")
print(fa$loadings,cutoff=.3)

Loadings:
           Factor1 Factor2 Factor3 Factor4
Filling     0.691                         
Natural     0.748                         
Fibre       0.815                         
Sweet               0.713           0.356 
Easy                        0.303         
Salt                0.693                 
Satisfying  0.613           0.386         
Energy      0.651                         
Fun                         0.383   0.529 
Kids                        0.875         
Soggy                              -0.456 
Economical                  0.392         
Health      0.831                         
Family                      0.798         
Calories            0.613                 
Plain                              -0.650 
Crisp                       0.330   0.434 
Regular     0.635                         
Sugar               0.822                 
Fruit       0.366                   0.443 
Process             0.373                 
Quality     0.663                         
Treat                               0.623 
Boring                             -0.504 
Nutritious  0.834                         

               Factor1 Factor2 Factor3 Factor4
SS loadings      5.188   2.643   2.407   2.373
Proportion Var   0.208   0.106   0.096   0.095
Cumulative Var   0.208   0.313   0.410   0.504

hence we see that four factors sufficently categorze our data

Category 1: Fibre, Nutritious, Health, Natural, Filling, Energy, Regular, Satifying, Quality

Category 2: Sugar, Salt, Sweet, Calories, Process

Category 3: Kids, Family, Economical

Category 4: Plain, Treat, Boring, Soggy, Fun, Fruit, Crisp

LS0tDQp0aXRsZTogIkFzc2lnbm1lbnQgUHJvYmxlbSAxIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCkxvYWRpbmcgbGlicmFyaWVzDQoNCmBgYHtyfQ0KbGlicmFyeShyZWFkcikNCmxpYnJhcnkoY29ycnBsb3QpDQpsaWJyYXJ5KGZhY3RvZXh0cmEpDQpsaWJyYXJ5KHBzeWNoKQ0KbGlicmFyeShHUEFyb3RhdGlvbikNCmBgYA0KTG9hZGluZyBkYXRhDQoNCmBgYHtyfQ0KY2VyZWFsIDwtIHJlYWRfY3N2KCJEOi9QRyBCdXNpbmVzcyBBbmFseXRpY3MvQVMvR3JvdXAgQXNzaWdubWVudC9jZXJlYWwuY3N2IikNCg0KYGBgDQoNCldoYXQgZG8gd2UgaGF2ZSBoZXJlDQoNCmBgYHtyfQ0Kc3VtbWFyeShjZXJlYWwpDQpgYGANCg0KYGBge3J9DQpwbG90KGhjbHVzdChkaXN0KGFzLm1hdHJpeChjZXJlYWxbMjoyNl0pKSxtZXRob2QgPSAid2FyZC5EIiksIG1haW4gPSJTZWdtZW50IFZhbGRhdGlvbiIpDQpgYGANCg0KDQoNCg0KYGBge3J9DQpib3hwbG90KGNlcmVhbFsyOjI2XSkNCmBgYA0KDQpvayBzb21lIGhhdmUgMS01IHdoaWxlIG90aGVycyBoYXZlIDEtNiByYW5nZS4gU3RyYW5nZQ0KDQoNCg0KYGBge3J9DQpjb3VudHM8LWFzLmRhdGEuZnJhbWUodGFibGUoY2VyZWFsWzFdKSkNCmNvdW50cw0KYGBgDQoNClNpbmNlIHRoZSBudW1iZXIgb2YgY29ycmVzcG9uZGFudHMgYXJlIGRpZmZlcmVudCBmb3IgYWxsIGNlcmVscyB3ZSBtdXN0IHNjYWxlIGFsbCB2YXJpYWxiZXMuDQpTY2FsaW5nIHVzaW5nIG1lYW4NCg0KYGBge3J9DQpkYXRhPC1hZ2dyZWdhdGUoLn5DZXJlYWxzLGNlcmVhbCwgRlVOPXN1bSkNCmRhdGE8LWNiaW5kKGRhdGEsY291bnRzWzJdKQ0KZGF0YQ0KYGBgDQoNClRoaXMgaXMgYWxsIHRoZSByZXNwb25jZXMgYWRkZWQgdXAsIG5vdyBkaXZpZGluZyBhbGwgYnkgRnJlcS4NCg0KDQpgYGB7cn0NCmZvciAoIGkgaW4gYygxOjEyKSkgew0KICBkYXRhW2ksMjoyNl08LWRhdGFbaSwyOjI2XS9kYXRhW2ksMjddDQp9DQoNCmRhdGENCmBgYA0KDQoNCg0KYGBge3IsIGZpZy53aWR0aD0yNX0NCmJveHBsb3QoZGF0YVsyOjI2XSkNCmBgYA0KDQpgYGB7cn0NCmNlcmVhbE1hdCA8LSBjb3IoZGF0YVsyOjI2XSkNCmNvcnJwbG90KGNlcmVhbE1hdCwgbWV0aG9kID0gImNpcmNsZSIpDQpgYGANCg0KDQpgYGB7cn0NCnBjYTwtUENBKGRhdGFbLDI6MjZdLHNjYWxlLnVuaXQ9VCxuY3A9ZGltKGRhdGEpWzJdLGdyYXBoID0gRikNCnN1bW1hcnkocGNhKQ0KYGBgDQoNCg0KYGBge3J9DQoNCmZ2aXpfcGNhX2JpcGxvdChwY2ExMSwgcmVwZWwgPSBUUlVFLA0KICAgICAgICAgICAgICAgIGNvbC52YXIgPSAiIzJFOUZERiIsICMgVmFyaWFibGVzIGNvbG9yDQogICAgICAgICAgICAgICAgY29sLmluZCA9ICIjNjk2OTY5IiAgIyBJbmRpdmlkdWFscyBjb2xvcg0KKQ0KYGBgDQoNCg0KYGBge3J9DQpjZXJlYWxLTU8gPC0gS01PKGNlcmVhbFssLTFdKQ0KY2VyZWFsS01PDQpgYGANCg0KDQpgYGB7cn0NCmNlcmVhbF9sb2c8LWNlcmVhbA0KZm9yICggaSBpbiBjKDI6MjYpKSB7DQogIGNlcmVhbF9sb2dbLGldPC1sb2cxMChjZXJlYWxfbG9nWyxpXSkNCn0NCmBgYA0KDQoNCmBgYHtyfQ0KZmFjdG9yIDwtIGZhKGNlcmVhbF9sb2dbLDI6MjZdLG5mYWN0b3JzID0gNCxyb3RhdGUgPSAib2JsaW1pbiIsZm09Im1pbnJlcyIpDQpwcmludChmYWN0b3IpDQpgYGANCg0KDQpgYGB7ciwgZmlnLmhlaWdodD0yMCwgZmlnLndpZHRoPTE1fQ0KZmEuZGlhZ3JhbShmYWN0b3IsbWFpbj0iU3RydWN0dXJlIERpYWdyYW0iKQ0KYGBgDQoNCmBgYHtyfQ0KDQpmYSA8LSBmYWN0YW5hbChjZXJlYWxbMjoyNl0sIGZhY3Rvcj00LCByb3RhdGlvbj0idmFyaW1heCIpDQoNCnByaW50KGZhJGxvYWRpbmdzLGN1dG9mZj0uMykNCmBgYA0KDQoNCjxiPg0KaGVuY2Ugd2Ugc2VlIHRoYXQgZm91ciBmYWN0b3JzIHN1ZmZpY2VudGx5IGNhdGVnb3J6ZSBvdXIgZGF0YQ0KDQpDYXRlZ29yeSAxOiBGaWJyZSwgTnV0cml0aW91cywgSGVhbHRoLCBOYXR1cmFsLCBGaWxsaW5nLCBFbmVyZ3ksIFJlZ3VsYXIsIFNhdGlmeWluZywgUXVhbGl0eQ0KDQpDYXRlZ29yeSAyOiBTdWdhciwgU2FsdCwgU3dlZXQsIENhbG9yaWVzLCBQcm9jZXNzDQoNCkNhdGVnb3J5IDM6IEtpZHMsIEZhbWlseSwgRWNvbm9taWNhbA0KDQpDYXRlZ29yeSA0OiBQbGFpbiwgVHJlYXQsIEJvcmluZywgU29nZ3ksIEZ1biwgRnJ1aXQsIENyaXNwDQoNCjwvYj4NCg0K