This section contains all the standard required library attachment which may be referred below during processing
## Warning: package 'ggplot2' was built under R version 3.4.1
## Warning: package 'ggthemes' was built under R version 3.4.1
## Warning: package 'scales' was built under R version 3.4.1
## Warning: package 'dplyr' was built under R version 3.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'mice' was built under R version 3.4.2
## Loading required package: lattice
## Warning: package 'randomForest' was built under R version 3.4.1
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
## Warning: package 'rpart' was built under R version 3.4.2
## Warning: package 'ROCR' was built under R version 3.4.1
## Loading required package: gplots
## Warning: package 'gplots' was built under R version 3.4.1
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
## Warning: package 'rpart.plot' was built under R version 3.4.2
## Warning: package 'corrr' was built under R version 3.4.1
## Warning: package 'corrplot' was built under R version 3.4.2
## corrplot 0.84 loaded
## Warning: package 'glue' was built under R version 3.4.2
##
## Attaching package: 'glue'
## The following object is masked from 'package:dplyr':
##
## collapse
## Warning: package 'caTools' was built under R version 3.4.1
## Warning: package 'data.table' was built under R version 3.4.2
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.4.2
## Loading required package: geosphere
## Warning: package 'geosphere' was built under R version 3.4.2
## Loading required package: gmapsdistance
## Warning: package 'gmapsdistance' was built under R version 3.4.2
## Loading required package: tidyr
## Warning: package 'tidyr' was built under R version 3.4.2
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:mice':
##
## complete
## Warning: package 'car' was built under R version 3.4.2
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## Warning: package 'caret' was built under R version 3.4.1
## Warning: package 'gclus' was built under R version 3.4.1
## Loading required package: cluster
## Warning: package 'visdat' was built under R version 3.4.1
## Warning: package 'psych' was built under R version 3.4.2
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
## The following object is masked from 'package:randomForest':
##
## outlier
## The following objects are masked from 'package:scales':
##
## alpha, rescale
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## Warning: package 'leaflet' was built under R version 3.4.1
## Warning: package 'leaflet.extras' was built under R version 3.4.1
## Warning: package 'GPArotation' was built under R version 3.4.1
## Warning: package 'MVN' was built under R version 3.4.2
## sROC 0.1-2 loaded
##
## Attaching package: 'MVN'
## The following object is masked from 'package:psych':
##
## mardia
## Warning: package 'MASS' was built under R version 3.4.1
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Warning: package 'psy' was built under R version 3.4.1
##
## Attaching package: 'psy'
## The following object is masked from 'package:psych':
##
## wkappa
## Warning: package 'corpcor' was built under R version 3.4.1
## Warning: package 'fastmatch' was built under R version 3.4.1
##
## Attaching package: 'fastmatch'
## The following object is masked from 'package:dplyr':
##
## coalesce
## Warning: package 'plyr' was built under R version 3.4.1
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
This sction is meant for data loading and data visualisation. Data Analysis shows that All the variable except first row is Numeric and they are ordered variable. This means they are scoring and another important information is that their scale is same which is 5. Data Visualisation shows that there is no empty record and outliers are not there. This makes data analysis more easy
#### Loading the data for processing
cereal_data <- read.csv('cereal.csv')
##Lets have a look at the variables and their summary to understand them..The summary shows that there is no missing data and no outliers..So that makes life easy..It also shows
##that first variable which is the cerial name is a factor variable and all others are numeric variable
str(cereal_data)
## 'data.frame': 235 obs. of 26 variables:
## $ Cereals : Factor w/ 12 levels "AllBran","CMuesli",..: 12 9 9 2 3 8 9 9 8 3 ...
## $ Filling : int 5 1 5 5 4 4 4 4 4 4 ...
## $ Natural : int 5 2 4 5 5 4 4 3 3 3 ...
## $ Fibre : int 5 2 5 5 3 4 3 3 3 3 ...
## $ Sweet : int 1 1 5 3 2 2 2 2 2 2 ...
## $ Easy : int 2 5 5 5 5 5 5 5 5 5 ...
## $ Salt : int 1 2 3 2 2 2 1 1 1 1 ...
## $ Satisfying: int 5 5 5 5 5 5 5 5 5 5 ...
## $ Energy : int 4 1 5 5 4 4 5 4 4 4 ...
## $ Fun : int 1 1 5 5 5 5 5 4 4 4 ...
## $ Kids : int 4 5 5 5 5 5 5 5 5 5 ...
## $ Soggy : int 5 3 3 3 1 1 1 1 1 1 ...
## $ Economical: int 5 5 3 3 5 5 5 3 3 3 ...
## $ Health : int 5 2 5 5 5 4 5 4 4 4 ...
## $ Family : int 5 5 5 5 3 5 5 5 5 5 ...
## $ Calories : int 1 1 1 1 3 3 3 2 2 2 ...
## $ Plain : int 3 5 1 1 1 1 1 3 3 3 ...
## $ Crisp : int 1 5 5 1 5 5 5 4 4 4 ...
## $ Regular : int 4 1 4 4 3 3 3 4 4 4 ...
## $ Sugar : int 1 2 3 2 1 2 2 1 1 1 ...
## $ Fruit : int 1 1 1 5 1 1 1 1 1 1 ...
## $ Process : int 3 5 2 2 3 3 3 2 2 2 ...
## $ Quality : int 5 2 5 5 5 5 5 4 4 4 ...
## $ Treat : int 1 1 4 5 5 5 5 2 2 2 ...
## $ Boring : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Nutritious: int 5 3 5 5 4 4 4 3 3 3 ...
summary(cereal_data)
## Cereals Filling Natural Fibre
## CornFlakes :27 Min. :1.000 Min. :1.000 Min. :1.000
## Weetabix :27 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Vitabrit :25 Median :4.000 Median :4.000 Median :4.000
## NutriGrain :24 Mean :3.881 Mean :3.783 Mean :3.528
## SpecialK :23 3rd Qu.:4.500 3rd Qu.:4.000 3rd Qu.:4.000
## RiceBubbles:21 Max. :5.000 Max. :5.000 Max. :5.000
## (Other) :88
## Sweet Easy Salt Satisfying
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :2.000
## 1st Qu.:2.000 1st Qu.:4.000 1st Qu.:1.000 1st Qu.:3.000
## Median :2.000 Median :5.000 Median :2.000 Median :4.000
## Mean :2.506 Mean :4.532 Mean :1.991 Mean :4.004
## 3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:3.000 3rd Qu.:5.000
## Max. :5.000 Max. :6.000 Max. :4.000 Max. :6.000
##
## Energy Fun Kids Soggy
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:1.000
## Median :4.000 Median :2.000 Median :4.000 Median :2.000
## Mean :3.643 Mean :2.617 Mean :3.843 Mean :2.255
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:3.000
## Max. :5.000 Max. :5.000 Max. :6.000 Max. :5.000
##
## Economical Health Family Calories
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000
## Median :3.000 Median :4.000 Median :4.000 Median :3.000
## Mean :3.217 Mean :3.809 Mean :3.877 Mean :2.702
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:3.000
## Max. :5.000 Max. :5.000 Max. :6.000 Max. :5.000
##
## Plain Crisp Regular Sugar
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000
## Median :2.000 Median :3.000 Median :3.000 Median :2.000
## Mean :2.268 Mean :3.204 Mean :3.072 Mean :2.145
## 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :5.000 Max. :6.000 Max. :5.000 Max. :5.000
##
## Fruit Process Quality Treat
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:2.00
## Median :1.000 Median :3.000 Median :4.000 Median :3.00
## Mean :1.694 Mean :2.936 Mean :3.694 Mean :2.63
## 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.00
## Max. :5.000 Max. :6.000 Max. :5.000 Max. :6.00
##
## Boring Nutritious
## Min. :1.00 Min. :1.000
## 1st Qu.:1.00 1st Qu.:3.000
## Median :2.00 Median :4.000
## Mean :1.83 Mean :3.664
## 3rd Qu.:2.00 3rd Qu.:4.000
## Max. :5.00 Max. :5.000
##
## Now we need to skip the cerial name as this is a string variable and store other numeric variable for corelation
revised_cereal_data <- cereal_data[,2:25]
This sectuion is meant for Test of Factorability. This is to ensure that there are factrs existing and data can be processed through FCA. Our first test here will be to test the co-relation of Variables to ensure that there is significant amount of co-relation existing within variable.
####Doing the co-relation here to understand if there are significant co-relation existing
revised_cereal_data.cormatrix <- cor(revised_cereal_data)
revised_cereal_data.cormatrix_rounded<- round(revised_cereal_data.cormatrix , digits=3)
print(revised_cereal_data.cormatrix)
## Filling Natural Fibre Sweet Easy
## Filling 1.00000000 0.53968982 0.55200307 0.19040004 0.236529426
## Natural 0.53968982 1.00000000 0.65228983 -0.09094192 0.230948116
## Fibre 0.55200307 0.65228983 1.00000000 -0.03739830 0.171721940
## Sweet 0.19040004 -0.09094192 -0.03739830 1.00000000 0.131211352
## Easy 0.23652943 0.23094812 0.17172194 0.13121135 1.000000000
## Salt -0.03626646 -0.21687200 -0.17488799 0.44399118 0.026936813
## Satisfying 0.64850720 0.46324724 0.40804658 0.18477668 0.362309849
## Energy 0.63675882 0.49354159 0.50373069 0.18496133 0.182630082
## Fun 0.26521397 0.08190321 0.06273083 0.32722313 0.248246218
## Kids 0.16000741 0.06029259 -0.09483085 0.12324275 0.251967889
## Soggy -0.05988555 0.06839973 -0.04166740 -0.08375192 -0.008804016
## Economical 0.05194244 0.10316137 -0.03396533 -0.23981376 0.092549751
## Health 0.54706871 0.68809770 0.68398369 -0.11562213 0.204021617
## Family 0.23287605 0.10708173 -0.01039870 0.04343192 0.235175664
## Calories 0.04721422 -0.16167366 -0.18654202 0.46731243 -0.015414590
## Plain -0.25064803 -0.13851302 -0.12284590 -0.28955897 0.017651539
## Crisp 0.12650057 0.02152927 0.04832402 0.26276634 0.252223659
## Regular 0.42049880 0.41763842 0.64837568 -0.02518025 0.105246439
## Sugar -0.07851945 -0.31680448 -0.22556725 0.64838267 -0.005731638
## Fruit 0.26116604 0.30015027 0.29314106 0.34650542 0.037197487
## Process -0.23171929 -0.30454423 -0.19522957 0.11851882 -0.048325631
## Quality 0.44321697 0.57909956 0.51319376 -0.07754712 0.171421388
## Treat 0.33764173 0.16939794 0.13967658 0.37627476 0.198897222
## Boring -0.17785084 -0.21758679 -0.09925867 -0.20033406 -0.167536447
## Salt Satisfying Energy Fun
## Filling -3.626646e-02 6.485072e-01 0.63675882 0.265213973
## Natural -2.168720e-01 4.632472e-01 0.49354159 0.081903207
## Fibre -1.748880e-01 4.080466e-01 0.50373069 0.062730827
## Sweet 4.439912e-01 1.847767e-01 0.18496133 0.327223134
## Easy 2.693681e-02 3.623098e-01 0.18263008 0.248246218
## Salt 1.000000e+00 5.371153e-05 -0.06713581 0.033474536
## Satisfying 5.371153e-05 1.000000e+00 0.59966696 0.354852245
## Energy -6.713581e-02 5.996670e-01 1.00000000 0.350327368
## Fun 3.347454e-02 3.548522e-01 0.35032737 1.000000000
## Kids 3.298221e-02 3.121943e-01 0.13057221 0.349965961
## Soggy 2.359707e-02 -1.425032e-02 -0.04592438 -0.098754958
## Economical -1.259049e-01 2.137707e-01 0.02641362 0.040700477
## Health -2.283768e-01 5.181843e-01 0.52424330 0.100955593
## Family -7.965423e-02 3.546910e-01 0.19136018 0.352476715
## Calories 4.380975e-01 1.219290e-02 0.03362541 0.113449859
## Plain 2.137203e-02 -1.795016e-01 -0.25577344 -0.322275476
## Crisp 1.033722e-01 2.762034e-01 0.24829752 0.402670109
## Regular -1.645302e-01 3.282587e-01 0.38571918 0.136731512
## Sugar 5.917709e-01 -8.119525e-02 -0.08606954 0.165290744
## Fruit 2.557426e-02 2.538439e-01 0.27438372 0.251421273
## Process 3.048367e-01 -1.649989e-01 -0.10157043 -0.002202497
## Quality -2.178523e-01 4.747221e-01 0.45703627 0.224503157
## Treat 1.278963e-01 3.821680e-01 0.32246211 0.586497175
## Boring 1.122315e-01 -3.156052e-01 -0.22338882 -0.298063613
## Kids Soggy Economical Health Family
## Filling 0.16000741 -0.059885555 0.05194244 0.547068708 0.23287605
## Natural 0.06029259 0.068399728 0.10316137 0.688097695 0.10708173
## Fibre -0.09483085 -0.041667404 -0.03396533 0.683983690 -0.01039870
## Sweet 0.12324275 -0.083751919 -0.23981376 -0.115622126 0.04343192
## Easy 0.25196789 -0.008804016 0.09254975 0.204021617 0.23517566
## Salt 0.03298221 0.023597066 -0.12590486 -0.228376777 -0.07965423
## Satisfying 0.31219433 -0.014250315 0.21377074 0.518184332 0.35469098
## Energy 0.13057221 -0.045924383 0.02641362 0.524243298 0.19136018
## Fun 0.34996596 -0.098754958 0.04070048 0.100955593 0.35247672
## Kids 1.00000000 0.087658404 0.30491206 -0.012760939 0.72701108
## Soggy 0.08765840 1.000000000 0.11715122 0.006146656 0.08138269
## Economical 0.30491206 0.117151217 1.00000000 0.192658638 0.23335601
## Health -0.01276094 0.006146656 0.19265864 1.000000000 0.08211150
## Family 0.72701108 0.081382689 0.23335601 0.082111496 1.00000000
## Calories 0.01435653 -0.079664961 -0.21047144 -0.307176155 -0.06072469
## Plain 0.02921320 0.346129827 0.23120114 -0.099609317 -0.02899279
## Crisp 0.30194580 -0.335632858 0.09269221 0.082416385 0.28807642
## Regular -0.02583127 -0.137300090 0.08029354 0.543222577 0.04374372
## Sugar -0.01589207 -0.094456381 -0.29255416 -0.376892968 -0.05448568
## Fruit -0.23212140 -0.137035700 -0.33848391 0.266341061 -0.12389300
## Process 0.02697130 0.058660437 -0.12542062 -0.289470005 -0.01293971
## Quality 0.11634764 -0.029808614 0.21549364 0.686304848 0.24107378
## Treat 0.28528848 -0.254809103 -0.03971178 0.214279460 0.30077242
## Boring -0.19372777 0.226885825 -0.02137835 -0.228589063 -0.24764977
## Calories Plain Crisp Regular Sugar
## Filling 0.04721422 -0.25064803 0.12650057 0.42049880 -0.078519448
## Natural -0.16167366 -0.13851302 0.02152927 0.41763842 -0.316804483
## Fibre -0.18654202 -0.12284590 0.04832402 0.64837568 -0.225567250
## Sweet 0.46731243 -0.28955897 0.26276634 -0.02518025 0.648382667
## Easy -0.01541459 0.01765154 0.25222366 0.10524644 -0.005731638
## Salt 0.43809745 0.02137203 0.10337222 -0.16453021 0.591770895
## Satisfying 0.01219290 -0.17950163 0.27620343 0.32825868 -0.081195255
## Energy 0.03362541 -0.25577344 0.24829752 0.38571918 -0.086069535
## Fun 0.11344986 -0.32227548 0.40267011 0.13673151 0.165290744
## Kids 0.01435653 0.02921320 0.30194580 -0.02583127 -0.015892067
## Soggy -0.07966496 0.34612983 -0.33563286 -0.13730009 -0.094456381
## Economical -0.21047144 0.23120114 0.09269221 0.08029354 -0.292554157
## Health -0.30717616 -0.09960932 0.08241638 0.54322258 -0.376892968
## Family -0.06072469 -0.02899279 0.28807642 0.04374372 -0.054485684
## Calories 1.00000000 -0.07619086 0.14705033 -0.16490865 0.525826174
## Plain -0.07619086 1.00000000 -0.20966191 -0.08026008 -0.146856923
## Crisp 0.14705033 -0.20966191 1.00000000 0.13330386 0.168930691
## Regular -0.16490865 -0.08026008 0.13330386 1.00000000 -0.090571672
## Sugar 0.52582617 -0.14685692 0.16893069 -0.09057167 1.000000000
## Fruit 0.12593786 -0.34308629 0.09022960 0.25474509 0.145330476
## Process 0.27389198 0.11318637 0.02484550 -0.14912239 0.369253388
## Quality -0.20135015 -0.22690816 0.13432982 0.44147633 -0.263389434
## Treat 0.19292930 -0.42989349 0.46808561 0.16655163 0.217100889
## Boring -0.02701569 0.33052554 -0.32358917 -0.09469787 -0.000921067
## Fruit Process Quality Treat Boring
## Filling 0.26116604 -0.231719289 0.44321697 0.33764173 -0.177850835
## Natural 0.30015027 -0.304544226 0.57909956 0.16939794 -0.217586787
## Fibre 0.29314106 -0.195229572 0.51319376 0.13967658 -0.099258673
## Sweet 0.34650542 0.118518821 -0.07754712 0.37627476 -0.200334059
## Easy 0.03719749 -0.048325631 0.17142139 0.19889722 -0.167536447
## Salt 0.02557426 0.304836682 -0.21785225 0.12789628 0.112231476
## Satisfying 0.25384390 -0.164998880 0.47472211 0.38216799 -0.315605190
## Energy 0.27438372 -0.101570429 0.45703627 0.32246211 -0.223388817
## Fun 0.25142127 -0.002202497 0.22450316 0.58649717 -0.298063613
## Kids -0.23212140 0.026971295 0.11634764 0.28528848 -0.193727771
## Soggy -0.13703570 0.058660437 -0.02980861 -0.25480910 0.226885825
## Economical -0.33848391 -0.125420621 0.21549364 -0.03971178 -0.021378353
## Health 0.26634106 -0.289470005 0.68630485 0.21427946 -0.228589063
## Family -0.12389300 -0.012939711 0.24107378 0.30077242 -0.247649766
## Calories 0.12593786 0.273891978 -0.20135015 0.19292930 -0.027015687
## Plain -0.34308629 0.113186375 -0.22690816 -0.42989349 0.330525544
## Crisp 0.09022960 0.024845497 0.13432982 0.46808561 -0.323589173
## Regular 0.25474509 -0.149122387 0.44147633 0.16655163 -0.094697872
## Sugar 0.14533048 0.369253388 -0.26338943 0.21710089 -0.000921067
## Fruit 1.00000000 -0.140204367 0.16460384 0.31255476 -0.260061699
## Process -0.14020437 1.000000000 -0.18304941 0.03109584 0.171709754
## Quality 0.16460384 -0.183049413 1.00000000 0.33407508 -0.284256014
## Treat 0.31255476 0.031095839 0.33407508 1.00000000 -0.359339636
## Boring -0.26006170 0.171709754 -0.28425601 -0.35933964 1.000000000
corrplot(revised_cereal_data.cormatrix, method="shade", type="full", addCoef.col = "blue", order ="AOE", bg ='grey')
Bartlett’s test of sphericitty is required to test the p-value. Our Nulll hypothesis is that there is no factors existing and hence FCA can’t be done. Our p-value threshold is 0.05. p-value came is 0 and so our Null hypothesis is rejcted and we can proceed for FCA
revised_cereal_data.barttest <- cortest.bartlett(revised_cereal_data.cormatrix, n=nrow(cereal_data))
print(revised_cereal_data.barttest)
## $chisq
## [1] 2613.019
##
## $p.value
## [1] 0
##
## $df
## [1] 276
We need to do the KMO test now to see the result. Overall MSA to be 0.84 which shows a higher degree of common variance. All the variable individual MSA value is also > 0.5. This confirms us that FCA can be done on this data
For reference, Kaiser put the following values on the results: – 0.00 to 0.49 unacceptable. – 0.50 to 0.59 miserable. – 0.60 to 0.69 mediocre. – 0.70 to 0.79 middling. – 0.80 to 0.89 meritorious. – 0.90 to 1.00 marvelous.
KMO(revised_cereal_data.cormatrix_rounded)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = revised_cereal_data.cormatrix_rounded)
## Overall MSA = 0.84
## MSA for each item =
## Filling Natural Fibre Sweet Easy Salt
## 0.88 0.88 0.87 0.79 0.83 0.82
## Satisfying Energy Fun Kids Soggy Economical
## 0.90 0.90 0.85 0.68 0.64 0.72
## Health Family Calories Plain Crisp Regular
## 0.89 0.73 0.85 0.81 0.83 0.83
## Sugar Fruit Process Quality Treat Boring
## 0.78 0.77 0.80 0.90 0.87 0.88
First step for our FCA analysis is to find out optimal number of Factor. So we wil use the physch package and fa.parallel function to find out the factor. Thumb rule is the factor corresponding to eigen value 1 should be considered. Another approximation rule is to find out number of factor where elbow happens. Factor method is assumed as Principal Axis one. This will draw the scree plot. Scree Plot shows the number of factor as 4.
revised_cereal_data.fca <- fa.parallel(revised_cereal_data, fm = 'pa', fa = 'fa')
## Parallel analysis suggests that the number of factors = 4 and the number of components = NA
FCA is now done with number if factor as 4. Thumb rule is that, first time FCA is done without rotation and see that if variables segregation makes sense for factors. It is clear from “Cumulative Proportion” that PA1, PA2 and PA3 combingly is able to explain 91% of common variance…Hence our final selected factors are PA1, PA2 and PA3.
Now we should validate the output of FCA… The root mean square of residuals (RMSR) is 0.04. This is acceptable as this value should be closer to 0. RMSEA (Root Mean Square Error Approximation) index = 0.07 which is > 0.05 and this may pose some. Value of “Tucker Lewis Index” value is 0.876 which is very near to 0.9 which is good.
revised_cereal_data.factor <- fa(revised_cereal_data,nfactors = 4,rotate = "none",fm="pa")
print(revised_cereal_data.factor)
## Factor Analysis using method = pa
## Call: fa(r = revised_cereal_data, nfactors = 4, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 PA4 h2 u2 com
## Filling 0.73 0.04 -0.09 0.22 0.60 0.40 1.2
## Natural 0.71 -0.29 -0.14 0.13 0.62 0.38 1.5
## Fibre 0.68 -0.28 -0.33 0.19 0.69 0.31 2.0
## Sweet 0.13 0.74 -0.18 0.14 0.62 0.38 1.3
## Easy 0.33 0.11 0.19 0.11 0.17 0.83 2.1
## Salt -0.19 0.53 -0.11 0.39 0.48 0.52 2.2
## Satisfying 0.74 0.12 0.14 0.19 0.62 0.38 1.3
## Energy 0.70 0.08 -0.09 0.14 0.53 0.47 1.1
## Fun 0.44 0.45 0.20 -0.17 0.46 0.54 2.7
## Kids 0.26 0.24 0.77 0.10 0.73 0.27 1.5
## Soggy -0.12 -0.21 0.16 0.39 0.24 0.76 2.1
## Economical 0.14 -0.25 0.46 0.09 0.31 0.69 1.9
## Health 0.78 -0.36 -0.14 0.10 0.77 0.23 1.5
## Family 0.35 0.17 0.67 0.02 0.59 0.41 1.7
## Calories -0.12 0.58 -0.15 0.21 0.42 0.58 1.5
## Plain -0.34 -0.33 0.23 0.42 0.46 0.54 3.5
## Crisp 0.33 0.41 0.21 -0.19 0.36 0.64 2.9
## Regular 0.56 -0.16 -0.20 0.08 0.38 0.62 1.5
## Sugar -0.21 0.77 -0.22 0.23 0.73 0.27 1.5
## Fruit 0.38 0.23 -0.48 -0.14 0.44 0.56 2.6
## Process -0.28 0.29 0.03 0.21 0.21 0.79 2.8
## Quality 0.70 -0.19 0.02 -0.02 0.53 0.47 1.1
## Treat 0.51 0.52 0.07 -0.22 0.59 0.41 2.4
## Boring -0.41 -0.22 -0.08 0.33 0.34 0.66 2.6
##
## PA1 PA2 PA3 PA4
## SS loadings 5.47 3.30 2.03 1.10
## Proportion Var 0.23 0.14 0.08 0.05
## Cumulative Var 0.23 0.37 0.45 0.50
## Proportion Explained 0.46 0.28 0.17 0.09
## Cumulative Proportion 0.46 0.74 0.91 1.00
##
## Mean item complexity = 1.9
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 276 and the objective function was 11.6 with Chi Square of 2613.02
## The degrees of freedom for the model are 186 and the objective function was 1.7
##
## The root mean square of the residuals (RMSR) is 0.04
## The df corrected root mean square of the residuals is 0.05
##
## The harmonic number of observations is 235 with the empirical chi square 199.55 with prob < 0.24
## The total number of observations was 235 with Likelihood Chi Square = 378.03 with prob < 3.5e-15
##
## Tucker Lewis Index of factoring reliability = 0.876
## RMSEA index = 0.07 and the 90 % confidence intervals are 0.057 0.076
## BIC = -637.46
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy
## PA1 PA2 PA3 PA4
## Correlation of (regression) scores with factors 0.97 0.94 0.92 0.83
## Multiple R square of scores with factors 0.93 0.89 0.84 0.68
## Minimum correlation of possible factor scores 0.87 0.78 0.68 0.37
print(revised_cereal_data.factor$loadings,cutoff = 0.5) ## Here we see that very few variable has double loading
##
## Loadings:
## PA1 PA2 PA3 PA4
## Filling 0.733
## Natural 0.706
## Fibre 0.683
## Sweet 0.742
## Easy
## Salt 0.535
## Satisfying 0.741
## Energy 0.705
## Fun
## Kids 0.768
## Soggy
## Economical
## Health 0.775
## Family 0.665
## Calories 0.583
## Plain
## Crisp
## Regular 0.555
## Sugar 0.766
## Fruit
## Process
## Quality 0.704
## Treat 0.515 0.525
## Boring
##
## PA1 PA2 PA3 PA4
## SS loadings 5.466 3.304 2.028 1.095
## Proportion Var 0.228 0.138 0.085 0.046
## Cumulative Var 0.228 0.365 0.450 0.496
We will try the FCA with rotation. The root mean square of residuals (RMSR) is 0.04. This is acceptable as this value should be closer to 0. RMSEA (Root Mean Square Error Approximation) index = 0.07 which is > 0.05 and this may pose some. Value of “Tucker Lewis Index” is very near to 0.9 which is good. So it is very clear that there is no change due to rotation.
revised_cereal_data.factor_rotate <- fa(revised_cereal_data,nfactors = 4,rotate = "verimax",fm="pa")
## Specified rotation not found, rotate='none' used
print(revised_cereal_data.factor_rotate)
## Factor Analysis using method = pa
## Call: fa(r = revised_cereal_data, nfactors = 4, rotate = "verimax",
## fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 PA4 h2 u2 com
## Filling 0.73 0.04 -0.09 0.22 0.60 0.40 1.2
## Natural 0.71 -0.29 -0.14 0.13 0.62 0.38 1.5
## Fibre 0.68 -0.28 -0.33 0.19 0.69 0.31 2.0
## Sweet 0.13 0.74 -0.18 0.14 0.62 0.38 1.3
## Easy 0.33 0.11 0.19 0.11 0.17 0.83 2.1
## Salt -0.19 0.53 -0.11 0.39 0.48 0.52 2.2
## Satisfying 0.74 0.12 0.14 0.19 0.62 0.38 1.3
## Energy 0.70 0.08 -0.09 0.14 0.53 0.47 1.1
## Fun 0.44 0.45 0.20 -0.17 0.46 0.54 2.7
## Kids 0.26 0.24 0.77 0.10 0.73 0.27 1.5
## Soggy -0.12 -0.21 0.16 0.39 0.24 0.76 2.1
## Economical 0.14 -0.25 0.46 0.09 0.31 0.69 1.9
## Health 0.78 -0.36 -0.14 0.10 0.77 0.23 1.5
## Family 0.35 0.17 0.67 0.02 0.59 0.41 1.7
## Calories -0.12 0.58 -0.15 0.21 0.42 0.58 1.5
## Plain -0.34 -0.33 0.23 0.42 0.46 0.54 3.5
## Crisp 0.33 0.41 0.21 -0.19 0.36 0.64 2.9
## Regular 0.56 -0.16 -0.20 0.08 0.38 0.62 1.5
## Sugar -0.21 0.77 -0.22 0.23 0.73 0.27 1.5
## Fruit 0.38 0.23 -0.48 -0.14 0.44 0.56 2.6
## Process -0.28 0.29 0.03 0.21 0.21 0.79 2.8
## Quality 0.70 -0.19 0.02 -0.02 0.53 0.47 1.1
## Treat 0.51 0.52 0.07 -0.22 0.59 0.41 2.4
## Boring -0.41 -0.22 -0.08 0.33 0.34 0.66 2.6
##
## PA1 PA2 PA3 PA4
## SS loadings 5.47 3.30 2.03 1.10
## Proportion Var 0.23 0.14 0.08 0.05
## Cumulative Var 0.23 0.37 0.45 0.50
## Proportion Explained 0.46 0.28 0.17 0.09
## Cumulative Proportion 0.46 0.74 0.91 1.00
##
## Mean item complexity = 1.9
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 276 and the objective function was 11.6 with Chi Square of 2613.02
## The degrees of freedom for the model are 186 and the objective function was 1.7
##
## The root mean square of the residuals (RMSR) is 0.04
## The df corrected root mean square of the residuals is 0.05
##
## The harmonic number of observations is 235 with the empirical chi square 199.55 with prob < 0.24
## The total number of observations was 235 with Likelihood Chi Square = 378.03 with prob < 3.5e-15
##
## Tucker Lewis Index of factoring reliability = 0.876
## RMSEA index = 0.07 and the 90 % confidence intervals are 0.057 0.076
## BIC = -637.46
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy
## PA1 PA2 PA3 PA4
## Correlation of (regression) scores with factors 0.97 0.94 0.92 0.83
## Multiple R square of scores with factors 0.93 0.89 0.84 0.68
## Minimum correlation of possible factor scores 0.87 0.78 0.68 0.37
print(revised_cereal_data.factor$loadings_rotate,cutoff = 0.5) ## Here we see that very few variable has double loading
## NULL
Our selected factors are PA1, PA2 and PA3. Now we need to do the factor mapping. So our final factor mapping are (with 0.5 as loading cut-off) ## PA1 : Health, Satisfying, Filling, Natural,Energy, Quality, Fibre and Regular (Latest Factor Name can be: Healthy) ## PA2: Sugar, Sweet, Calories, Salt and Treat (Latest Factor Name can be: Not-Calorie Conscious) ## PA3: Kids and Family (Latest Factor Name can be: Family Oriented)
fa.diagram(revised_cereal_data.factor,cut = 0.5)
We have already identified latent factors and our variables are PA1, PA2 and PA3
myfactorColumnno1 <- c(fmatch("Health",names(cereal_data)), fmatch("Satisfying",names(cereal_data)), fmatch("Filling",names(cereal_data)), fmatch("Natural",names(cereal_data)), fmatch("Filling",names(cereal_data)), fmatch("Energy",names(cereal_data)), fmatch("Quality",names(cereal_data)), fmatch("Fibre",names(cereal_data)), fmatch("Regular",names(cereal_data)))
myfactorColumnno2 <- c(fmatch("Sugar",names(cereal_data)), fmatch("Sweet",names(cereal_data)), fmatch("Calories",names(cereal_data)), fmatch("Salt",names(cereal_data)), fmatch("Treat",names(cereal_data)))
myfactorColumnno3 <- c(fmatch("Kids",names(cereal_data)), fmatch("Family",names(cereal_data)))
cereal_data$Healthy <- apply(cereal_data[,myfactorColumnno1],1,mean)
cereal_data$Non_Caclorie_Conscious <- apply(cereal_data[,myfactorColumnno2],1,mean)
cereal_data$Family_Oriented <- apply(cereal_data[,myfactorColumnno3],1,mean)
str(cereal_data)
## 'data.frame': 235 obs. of 29 variables:
## $ Cereals : Factor w/ 12 levels "AllBran","CMuesli",..: 12 9 9 2 3 8 9 9 8 3 ...
## $ Filling : int 5 1 5 5 4 4 4 4 4 4 ...
## $ Natural : int 5 2 4 5 5 4 4 3 3 3 ...
## $ Fibre : int 5 2 5 5 3 4 3 3 3 3 ...
## $ Sweet : int 1 1 5 3 2 2 2 2 2 2 ...
## $ Easy : int 2 5 5 5 5 5 5 5 5 5 ...
## $ Salt : int 1 2 3 2 2 2 1 1 1 1 ...
## $ Satisfying : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Energy : int 4 1 5 5 4 4 5 4 4 4 ...
## $ Fun : int 1 1 5 5 5 5 5 4 4 4 ...
## $ Kids : int 4 5 5 5 5 5 5 5 5 5 ...
## $ Soggy : int 5 3 3 3 1 1 1 1 1 1 ...
## $ Economical : int 5 5 3 3 5 5 5 3 3 3 ...
## $ Health : int 5 2 5 5 5 4 5 4 4 4 ...
## $ Family : int 5 5 5 5 3 5 5 5 5 5 ...
## $ Calories : int 1 1 1 1 3 3 3 2 2 2 ...
## $ Plain : int 3 5 1 1 1 1 1 3 3 3 ...
## $ Crisp : int 1 5 5 1 5 5 5 4 4 4 ...
## $ Regular : int 4 1 4 4 3 3 3 4 4 4 ...
## $ Sugar : int 1 2 3 2 1 2 2 1 1 1 ...
## $ Fruit : int 1 1 1 5 1 1 1 1 1 1 ...
## $ Process : int 3 5 2 2 3 3 3 2 2 2 ...
## $ Quality : int 5 2 5 5 5 5 5 4 4 4 ...
## $ Treat : int 1 1 4 5 5 5 5 2 2 2 ...
## $ Boring : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Nutritious : int 5 3 5 5 4 4 4 3 3 3 ...
## $ Healthy : num 4.78 1.89 4.78 4.89 4.22 ...
## $ Non_Caclorie_Conscious: num 1 1.4 3.2 2.6 2.6 2.8 2.6 1.6 1.6 1.6 ...
## $ Family_Oriented : num 4.5 5 5 5 4 5 5 5 5 5 ...
### Now our aim is to create a table with mean value for all three factors based on cereal
myresult <- aggregate(cbind(cereal_data$Healthy,cereal_data$Non_Caclorie_Conscious,cereal_data$Family_Oriented), by = list(cereal_data$Cereals), FUN = mean)
colnames(myresult)[2:4] <-c("Healthy", "Non Caclorie Conscious", "Family Oriented")
myresult$Healthy <- ceiling(myresult$Healthy)
myresult$`Non Caclorie Conscious` <- ceiling(myresult$`Non Caclorie Conscious`)
myresult$`Family Oriented` <- ceiling(myresult$`Family Oriented`)
## Now we should interpret the brand survey outcome...Lets define the range
## Value : (0-1.999 : No), (2-3: May be) and (4 -5 : Yes);
## recoding the data
myresult$Healthy <- recode(myresult$Healthy, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresult$`Non Caclorie Conscious` <- recode(myresult$`Non Caclorie Conscious`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresult$`Family Oriented` <- recode(myresult$`Family Oriented`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresult
## Group.1 Healthy Non Caclorie Conscious Family Oriented
## 1 AllBran Yes No No
## 2 CMuesli Yes No Yes
## 3 CornFlakes Yes No Yes
## 4 JustRight Yes No Yes
## 5 Komplete Yes No No
## 6 NutriGrain Yes Yes Yes
## 7 PMuesli Yes No Yes
## 8 RiceBubbles No No Yes
## 9 SpecialK Yes No Yes
## 10 Sustain Yes No Yes
## 11 Vitabrit Yes No Yes
## 12 Weetabix Yes No Yes
Factor score also can be used to understand and interpret the nature
revised_cereal_data.factorscore <- revised_cereal_data.factor$scores
mk <- as.data.frame(revised_cereal_data.factorscore )
mydata <- as.data.frame(cereal_data$Cereals)
# k <- rbind.fill(as.data.frame(cereal_data$Cereals),revised_cereal_data.factorscore)
mkwithscore <-cbind(mydata,revised_cereal_data.factorscore)
myresultmk <- aggregate(cbind(abs(mkwithscore$PA1),abs(mkwithscore$PA2),abs(mkwithscore$PA3)), by = list(cereal_data$Cereals), FUN = mean)
myresultmk$Healthy <- recode(myresult$Healthy, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresultmk$`Non Caclorie Conscious` <- recode(myresult$`Non Caclorie Conscious`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresultmk$`Family Oriented` <- recode(myresult$`Family Oriented`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresultmk$HealthyDerived[myresultmk$V1 > 0.75] = 'Yes' #if Median is higher, comparison = 1
myresultmk$HealthyDerived[myresultmk$V1 < 0.75] = 'No' #if Median is higher, comparison = 1
myresultmk$NonHealthyDerived[myresultmk$V2 > 0.75] = 'Yes' #if Median is higher, comparison = 1
myresultmk$NonHealthyDerived[myresultmk$V2 < 0.75] = 'No' #if Median is higher, comparison = 1
myresultmk$FamilyDerived[myresultmk$V3 > 0.75] = 'Yes' #if Median is higher, comparison = 1
myresultmk$FamilyDerived[myresultmk$V3 < 0.75] = 'No' #if Median is higher, comparison = 1
myresultmk
## Group.1 V1 V2 V3 Healthy
## 1 AllBran 0.6749282 0.7349573 1.0559282 Yes
## 2 CMuesli 0.9974637 0.7478035 0.6941258 Yes
## 3 CornFlakes 0.9150257 0.7442646 0.7249563 Yes
## 4 JustRight 0.7102331 0.7527087 0.6056755 Yes
## 5 Komplete 0.7004994 0.5258570 1.2313606 Yes
## 6 NutriGrain 0.8452656 1.1056192 0.5889176 Yes
## 7 PMuesli 0.7108973 0.8574756 0.9349777 Yes
## 8 RiceBubbles 0.9838677 0.6355220 1.0075413 No
## 9 SpecialK 0.7234387 0.4538966 0.6083918 Yes
## 10 Sustain 0.8404615 0.5869119 0.7631423 Yes
## 11 Vitabrit 0.5985314 0.9789122 0.4783065 Yes
## 12 Weetabix 0.4592862 0.8696765 0.5457632 Yes
## Non Caclorie Conscious Family Oriented HealthyDerived NonHealthyDerived
## 1 No No No No
## 2 No Yes Yes No
## 3 No Yes Yes No
## 4 No Yes No Yes
## 5 No No No No
## 6 Yes Yes Yes Yes
## 7 No Yes No Yes
## 8 No Yes Yes No
## 9 No Yes No No
## 10 No Yes Yes No
## 11 No Yes No Yes
## 12 No Yes No Yes
## FamilyDerived
## 1 Yes
## 2 No
## 3 No
## 4 No
## 5 Yes
## 6 No
## 7 Yes
## 8 Yes
## 9 No
## 10 Yes
## 11 No
## 12 No