Loading libraries
library(readr)
library(corrplot)
package 㤼㸱corrplot㤼㸲 was built under R version 3.4.2corrplot 0.84 loaded
library(factoextra)
package 㤼㸱factoextra㤼㸲 was built under R version 3.4.2Loading required package: ggplot2
Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
library(GPArotation)
Loading data
cereal <- read_csv("D:/PG Business Analytics/AS/Group Assignment/cereal.csv")
Parsed with column specification:
cols(
.default = col_integer(),
Cereals = col_character()
)
See spec(...) for full column specifications.
What do we have here
summary(cereal)
Cereals Filling Natural Fibre Sweet Easy Salt Satisfying
Length:235 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :2.000
Class :character 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:4.000 1st Qu.:1.000 1st Qu.:3.000
Mode :character Median :4.000 Median :4.000 Median :4.000 Median :2.000 Median :5.000 Median :2.000 Median :4.000
Mean :3.881 Mean :3.783 Mean :3.528 Mean :2.506 Mean :4.532 Mean :1.991 Mean :4.004
3rd Qu.:4.500 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:3.000 3rd Qu.:5.000
Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :6.000 Max. :4.000 Max. :6.000
Energy Fun Kids Soggy Economical Health Family Calories
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:3.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:1.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000
Median :4.000 Median :2.000 Median :4.000 Median :2.000 Median :3.000 Median :4.000 Median :4.000 Median :3.000
Mean :3.643 Mean :2.617 Mean :3.843 Mean :2.255 Mean :3.217 Mean :3.809 Mean :3.877 Mean :2.702
3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:3.000
Max. :5.000 Max. :5.000 Max. :6.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :6.000 Max. :5.000
Plain Crisp Regular Sugar Fruit Process Quality Treat Boring
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.00
1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:2.00 1st Qu.:1.00
Median :2.000 Median :3.000 Median :3.000 Median :2.000 Median :1.000 Median :3.000 Median :4.000 Median :3.00 Median :2.00
Mean :2.268 Mean :3.204 Mean :3.072 Mean :2.145 Mean :1.694 Mean :2.936 Mean :3.694 Mean :2.63 Mean :1.83
3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.00 3rd Qu.:2.00
Max. :5.000 Max. :6.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :6.000 Max. :5.000 Max. :6.00 Max. :5.00
Nutritious
Min. :1.000
1st Qu.:3.000
Median :4.000
Mean :3.664
3rd Qu.:4.000
Max. :5.000
plot(hclust(dist(as.matrix(cereal[2:26])),method = "ward.D"), main ="Segment Valdation")
boxplot(cereal[2:26])
ok some have 1-5 while others have 1-6 range. Strange
counts<-as.data.frame(table(cereal[1]))
counts
Since the number of correspondants are different for all cerels we must scale all varialbes. Scaling using mean
data<-aggregate(.~Cereals,cereal, FUN=sum)
data<-cbind(data,counts[2])
data
This is all the responces added up, now dividing all by Freq.
for ( i in c(1:12)) {
data[i,2:26]<-data[i,2:26]/data[i,27]
}
data
boxplot(data[2:26])
cerealMat <- cor(data[2:26])
corrplot(cerealMat, method = "circle")
pca<-PCA(data[,2:26],scale.unit=T,ncp=dim(data)[2],graph = F)
summary(pca)
Call:
PCA(X = data[, 2:26], scale.unit = T, ncp = dim(data)[2], graph = F)
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9 Dim.10 Dim.11
Variance 9.488 7.936 2.485 1.652 1.246 0.769 0.487 0.454 0.196 0.181 0.105
% of var. 37.953 31.743 9.941 6.608 4.985 3.077 1.947 1.816 0.786 0.726 0.418
Cumulative % of var. 37.953 69.696 79.637 86.245 91.230 94.307 96.254 98.070 98.856 99.582 100.000
Individuals (the 10 first)
Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
1 | 5.264 | 0.275 0.066 0.003 | -3.784 15.037 0.517 | -2.711 24.643 0.265 |
2 | 4.390 | 3.291 9.512 0.562 | 2.040 4.369 0.216 | 0.964 3.113 0.048 |
3 | 4.266 | -2.965 7.722 0.483 | 2.067 4.486 0.235 | 0.316 0.334 0.005 |
4 | 3.281 | 0.186 0.030 0.003 | 1.285 1.735 0.153 | -2.211 16.398 0.454 |
5 | 4.932 | 3.659 11.758 0.550 | -0.135 0.019 0.001 | -2.152 15.529 0.190 |
6 | 5.567 | -1.983 3.454 0.127 | 4.545 21.688 0.666 | -0.608 1.238 0.012 |
7 | 4.787 | 3.879 13.218 0.657 | 1.912 3.840 0.160 | 0.661 1.464 0.019 |
8 | 6.739 | -6.019 31.818 0.798 | 1.293 1.757 0.037 | 0.645 1.397 0.009 |
9 | 3.524 | -2.492 5.452 0.500 | 0.052 0.003 0.000 | 0.506 0.858 0.021 |
10 | 5.407 | 4.162 15.217 0.593 | 0.188 0.037 0.001 | 2.449 20.102 0.205 |
Variables (the 10 first)
Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
Filling | 0.810 6.915 0.656 | -0.178 0.398 0.032 | 0.244 2.393 0.059 |
Natural | 0.808 6.886 0.653 | -0.444 2.486 0.197 | 0.149 0.898 0.022 |
Fibre | 0.767 6.199 0.588 | -0.542 3.704 0.294 | -0.128 0.663 0.016 |
Sweet | 0.498 2.615 0.248 | 0.799 8.048 0.639 | -0.105 0.446 0.011 |
Easy | -0.055 0.032 0.003 | 0.213 0.573 0.045 | 0.706 20.027 0.498 |
Salt | 0.029 0.009 0.001 | 0.770 7.477 0.593 | -0.273 3.000 0.075 |
Satisfying | 0.755 6.007 0.570 | 0.154 0.297 0.024 | 0.485 9.446 0.235 |
Energy | 0.812 6.947 0.659 | 0.114 0.164 0.013 | 0.406 6.636 0.165 |
Fun | 0.220 0.510 0.048 | 0.876 9.665 0.767 | 0.260 2.715 0.067 |
Kids | -0.704 5.220 0.495 | 0.312 1.228 0.097 | 0.493 9.770 0.243 |
fviz_pca_biplot(pca11, repel = TRUE,
col.var = "#2E9FDF", # Variables color
col.ind = "#696969" # Individuals color
)
cerealKMO <- KMO(cereal[,-1])
cerealKMO
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = cereal[, -1])
Overall MSA = 0.85
MSA for each item =
Filling Natural Fibre Sweet Easy Salt Satisfying Energy Fun Kids Soggy Economical
0.89 0.90 0.88 0.78 0.83 0.82 0.91 0.91 0.85 0.68 0.63 0.73
Health Family Calories Plain Crisp Regular Sugar Fruit Process Quality Treat Boring
0.92 0.73 0.86 0.82 0.83 0.87 0.78 0.77 0.79 0.91 0.88 0.88
Nutritious
0.92
cereal_log<-cereal
for ( i in c(2:26)) {
cereal_log[,i]<-log10(cereal_log[,i])
}
factor <- fa(cereal_log[,2:26],nfactors = 4,rotate = "oblimin",fm="minres")
print(factor)
Factor Analysis using method = minres
Call: fa(r = cereal_log[, 2:26], nfactors = 4, rotate = "oblimin",
fm = "minres")
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR4 MR2 MR3 h2 u2 com
Filling 0.70 0.06 0.18 0.16 0.56 0.44 1.3
Natural 0.80 -0.02 -0.12 -0.01 0.67 0.33 1.0
Fibre 0.87 -0.08 0.03 -0.16 0.72 0.28 1.1
Sweet 0.02 0.28 0.65 0.03 0.60 0.40 1.4
Easy 0.24 0.04 0.10 0.24 0.14 0.86 2.4
Salt 0.01 -0.23 0.69 0.03 0.45 0.55 1.2
Satisfying 0.60 0.13 0.14 0.35 0.59 0.41 1.8
Energy 0.63 0.11 0.15 0.15 0.49 0.51 1.3
Fun 0.02 0.52 0.08 0.26 0.42 0.58 1.5
Kids -0.03 -0.03 0.04 0.85 0.71 0.29 1.0
Soggy 0.14 -0.54 0.09 0.09 0.24 0.76 1.2
Economical 0.01 -0.21 -0.28 0.39 0.27 0.73 2.4
Health 0.84 -0.02 -0.17 -0.03 0.77 0.23 1.1
Family 0.02 0.10 -0.10 0.75 0.61 0.39 1.1
Calories -0.09 -0.02 0.60 0.00 0.37 0.63 1.0
Plain -0.02 -0.70 0.00 0.15 0.49 0.51 1.1
Crisp -0.06 0.43 0.06 0.21 0.26 0.74 1.5
Regular 0.63 0.06 -0.02 -0.07 0.43 0.57 1.0
Sugar -0.13 0.06 0.76 -0.09 0.66 0.34 1.1
Fruit 0.27 0.44 0.18 -0.36 0.45 0.55 3.0
Process -0.09 -0.19 0.35 -0.02 0.16 0.84 1.7
Quality 0.59 0.16 -0.19 0.09 0.51 0.49 1.4
Treat 0.08 0.60 0.13 0.16 0.51 0.49 1.3
Boring -0.04 -0.58 0.12 -0.12 0.37 0.63 1.2
Nutritious 0.85 -0.04 -0.04 -0.03 0.71 0.29 1.0
MR1 MR4 MR2 MR3
SS loadings 5.14 2.59 2.37 2.06
Proportion Var 0.21 0.10 0.09 0.08
Cumulative Var 0.21 0.31 0.40 0.49
Proportion Explained 0.42 0.21 0.20 0.17
Cumulative Proportion 0.42 0.64 0.83 1.00
With factor correlations of
MR1 MR4 MR2 MR3
MR1 1.00 0.31 -0.17 0.09
MR4 0.31 1.00 0.26 0.15
MR2 -0.17 0.26 1.00 0.01
MR3 0.09 0.15 0.01 1.00
Mean item complexity = 1.4
Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the null model are 300 and the objective function was 12.15 with Chi Square of 2731.76
The degrees of freedom for the model are 206 and the objective function was 1.81
The root mean square of the residuals (RMSR) is 0.04
The df corrected root mean square of the residuals is 0.05
The harmonic number of observations is 235 with the empirical chi square 219.24 with prob < 0.25
The total number of observations was 235 with Likelihood Chi Square = 402.22 with prob < 8.5e-15
Tucker Lewis Index of factoring reliability = 0.881
RMSEA index = 0.067 and the 90 % confidence intervals are 0.054 0.073
BIC = -722.45
Fit based upon off diagonal values = 0.98
Measures of factor score adequacy
MR1 MR4 MR2 MR3
Correlation of scores with factors 0.97 0.90 0.92 0.92
Multiple R square of scores with factors 0.94 0.82 0.84 0.84
Minimum correlation of possible factor scores 0.88 0.64 0.68 0.68
fa.diagram(factor,main="Structure Diagram")
fa <- factanal(cereal[2:26], factor=4, rotation="varimax")
print(fa$loadings,cutoff=.3)
Loadings:
Factor1 Factor2 Factor3 Factor4
Filling 0.691
Natural 0.748
Fibre 0.815
Sweet 0.713 0.356
Easy 0.303
Salt 0.693
Satisfying 0.613 0.386
Energy 0.651
Fun 0.383 0.529
Kids 0.875
Soggy -0.456
Economical 0.392
Health 0.831
Family 0.798
Calories 0.613
Plain -0.650
Crisp 0.330 0.434
Regular 0.635
Sugar 0.822
Fruit 0.366 0.443
Process 0.373
Quality 0.663
Treat 0.623
Boring -0.504
Nutritious 0.834
Factor1 Factor2 Factor3 Factor4
SS loadings 5.188 2.643 2.407 2.373
Proportion Var 0.208 0.106 0.096 0.095
Cumulative Var 0.208 0.313 0.410 0.504
hence we see that four factors sufficently categorze our data
Category 1: Fibre, Nutritious, Health, Natural, Filling, Energy, Regular, Satifying, Quality
Category 2: Sugar, Salt, Sweet, Calories, Process
Category 3: Kids, Family, Economical
Category 4: Plain, Treat, Boring, Soggy, Fun, Fruit, Crisp