Cereal_Brand_Analysis

This section contains all the standard required library attachment which may be referred below during processing

## Warning: package 'ggplot2' was built under R version 3.4.1

## Warning: package 'ggthemes' was built under R version 3.4.1

## Warning: package 'scales' was built under R version 3.4.1

## Warning: package 'dplyr' was built under R version 3.4.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Warning: package 'mice' was built under R version 3.4.2

## Loading required package: lattice

## Warning: package 'randomForest' was built under R version 3.4.1

## randomForest 4.6-12

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

## Warning: package 'rpart' was built under R version 3.4.2

## Warning: package 'ROCR' was built under R version 3.4.1

## Loading required package: gplots

## Warning: package 'gplots' was built under R version 3.4.1

## 
## Attaching package: 'gplots'

## The following object is masked from 'package:stats':
## 
##     lowess

## Warning: package 'rpart.plot' was built under R version 3.4.2

## Warning: package 'corrr' was built under R version 3.4.1

## Warning: package 'corrplot' was built under R version 3.4.2

## corrplot 0.84 loaded

## Warning: package 'glue' was built under R version 3.4.2

## 
## Attaching package: 'glue'

## The following object is masked from 'package:dplyr':
## 
##     collapse

## Warning: package 'caTools' was built under R version 3.4.1

## Warning: package 'data.table' was built under R version 3.4.2

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## Loading required package: knitr

## Warning: package 'knitr' was built under R version 3.4.2

## Loading required package: geosphere

## Warning: package 'geosphere' was built under R version 3.4.2

## Loading required package: gmapsdistance

## Warning: package 'gmapsdistance' was built under R version 3.4.2

## Loading required package: tidyr

## Warning: package 'tidyr' was built under R version 3.4.2

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:mice':
## 
##     complete

## Warning: package 'car' was built under R version 3.4.2

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## Warning: package 'caret' was built under R version 3.4.1

## Warning: package 'gclus' was built under R version 3.4.1

## Loading required package: cluster

## Warning: package 'visdat' was built under R version 3.4.1

## Warning: package 'psych' was built under R version 3.4.2

## 
## Attaching package: 'psych'

## The following object is masked from 'package:car':
## 
##     logit

## The following object is masked from 'package:randomForest':
## 
##     outlier

## The following objects are masked from 'package:scales':
## 
##     alpha, rescale

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

## Warning: package 'leaflet' was built under R version 3.4.1

## Warning: package 'leaflet.extras' was built under R version 3.4.1

## Warning: package 'GPArotation' was built under R version 3.4.1

## Warning: package 'MVN' was built under R version 3.4.2

## sROC 0.1-2 loaded

## 
## Attaching package: 'MVN'

## The following object is masked from 'package:psych':
## 
##     mardia

## Warning: package 'MASS' was built under R version 3.4.1

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## Warning: package 'psy' was built under R version 3.4.1

## 
## Attaching package: 'psy'

## The following object is masked from 'package:psych':
## 
##     wkappa

## Warning: package 'corpcor' was built under R version 3.4.1

## Warning: package 'fastmatch' was built under R version 3.4.1

## 
## Attaching package: 'fastmatch'

## The following object is masked from 'package:dplyr':
## 
##     coalesce

## Warning: package 'plyr' was built under R version 3.4.1

## -------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## -------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

This sction is meant for data loading and data visualisation. Data Analysis shows that All the variable except first row is Numeric and they are ordered variable. This means they are scoring and another important information is that their scale is same which is 5. Data Visualisation shows that there is no empty record and outliers are not there. This makes data analysis more easy

#### Loading the data for processing
cereal_data <- read.csv('cereal.csv')

##Lets have a look at the variables and their summary to understand them..The summary shows that there is no missing data and no outliers..So that makes life easy..It also shows
##that first variable which is the cerial name is a factor variable and all others are numeric variable
str(cereal_data)

## 'data.frame':    235 obs. of  26 variables:
##  $ Cereals   : Factor w/ 12 levels "AllBran","CMuesli",..: 12 9 9 2 3 8 9 9 8 3 ...
##  $ Filling   : int  5 1 5 5 4 4 4 4 4 4 ...
##  $ Natural   : int  5 2 4 5 5 4 4 3 3 3 ...
##  $ Fibre     : int  5 2 5 5 3 4 3 3 3 3 ...
##  $ Sweet     : int  1 1 5 3 2 2 2 2 2 2 ...
##  $ Easy      : int  2 5 5 5 5 5 5 5 5 5 ...
##  $ Salt      : int  1 2 3 2 2 2 1 1 1 1 ...
##  $ Satisfying: int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Energy    : int  4 1 5 5 4 4 5 4 4 4 ...
##  $ Fun       : int  1 1 5 5 5 5 5 4 4 4 ...
##  $ Kids      : int  4 5 5 5 5 5 5 5 5 5 ...
##  $ Soggy     : int  5 3 3 3 1 1 1 1 1 1 ...
##  $ Economical: int  5 5 3 3 5 5 5 3 3 3 ...
##  $ Health    : int  5 2 5 5 5 4 5 4 4 4 ...
##  $ Family    : int  5 5 5 5 3 5 5 5 5 5 ...
##  $ Calories  : int  1 1 1 1 3 3 3 2 2 2 ...
##  $ Plain     : int  3 5 1 1 1 1 1 3 3 3 ...
##  $ Crisp     : int  1 5 5 1 5 5 5 4 4 4 ...
##  $ Regular   : int  4 1 4 4 3 3 3 4 4 4 ...
##  $ Sugar     : int  1 2 3 2 1 2 2 1 1 1 ...
##  $ Fruit     : int  1 1 1 5 1 1 1 1 1 1 ...
##  $ Process   : int  3 5 2 2 3 3 3 2 2 2 ...
##  $ Quality   : int  5 2 5 5 5 5 5 4 4 4 ...
##  $ Treat     : int  1 1 4 5 5 5 5 2 2 2 ...
##  $ Boring    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Nutritious: int  5 3 5 5 4 4 4 3 3 3 ...

summary(cereal_data)

##         Cereals      Filling         Natural          Fibre      
##  CornFlakes :27   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  Weetabix   :27   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000  
##  Vitabrit   :25   Median :4.000   Median :4.000   Median :4.000  
##  NutriGrain :24   Mean   :3.881   Mean   :3.783   Mean   :3.528  
##  SpecialK   :23   3rd Qu.:4.500   3rd Qu.:4.000   3rd Qu.:4.000  
##  RiceBubbles:21   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##  (Other)    :88                                                  
##      Sweet            Easy            Salt         Satisfying   
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2.000  
##  1st Qu.:2.000   1st Qu.:4.000   1st Qu.:1.000   1st Qu.:3.000  
##  Median :2.000   Median :5.000   Median :2.000   Median :4.000  
##  Mean   :2.506   Mean   :4.532   Mean   :1.991   Mean   :4.004  
##  3rd Qu.:3.000   3rd Qu.:5.000   3rd Qu.:3.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :6.000   Max.   :4.000   Max.   :6.000  
##                                                                 
##      Energy           Fun             Kids           Soggy      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:2.000   1st Qu.:3.000   1st Qu.:1.000  
##  Median :4.000   Median :2.000   Median :4.000   Median :2.000  
##  Mean   :3.643   Mean   :2.617   Mean   :3.843   Mean   :2.255  
##  3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:5.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :5.000   Max.   :6.000   Max.   :5.000  
##                                                                 
##    Economical        Health          Family         Calories    
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :3.000   Median :4.000   Median :4.000   Median :3.000  
##  Mean   :3.217   Mean   :3.809   Mean   :3.877   Mean   :2.702  
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :5.000   Max.   :6.000   Max.   :5.000  
##                                                                 
##      Plain           Crisp          Regular          Sugar      
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
##  Median :2.000   Median :3.000   Median :3.000   Median :2.000  
##  Mean   :2.268   Mean   :3.204   Mean   :3.072   Mean   :2.145  
##  3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.000  
##  Max.   :5.000   Max.   :6.000   Max.   :5.000   Max.   :5.000  
##                                                                 
##      Fruit          Process         Quality          Treat     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.00  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:3.000   1st Qu.:2.00  
##  Median :1.000   Median :3.000   Median :4.000   Median :3.00  
##  Mean   :1.694   Mean   :2.936   Mean   :3.694   Mean   :2.63  
##  3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.00  
##  Max.   :5.000   Max.   :6.000   Max.   :5.000   Max.   :6.00  
##                                                                
##      Boring       Nutritious   
##  Min.   :1.00   Min.   :1.000  
##  1st Qu.:1.00   1st Qu.:3.000  
##  Median :2.00   Median :4.000  
##  Mean   :1.83   Mean   :3.664  
##  3rd Qu.:2.00   3rd Qu.:4.000  
##  Max.   :5.00   Max.   :5.000  
##

## Now we need to skip the cerial name as this is a string variable and store other numeric variable for corelation
revised_cereal_data <- cereal_data[,2:25]

This sectuion is meant for Test of Factorability. This is to ensure that there are factrs existing and data can be processed through FCA. Our first test here will be to test the co-relation of Variables to ensure that there is significant amount of co-relation existing within variable.

####Doing the co-relation here to understand if there are significant co-relation existing
revised_cereal_data.cormatrix <- cor(revised_cereal_data)
revised_cereal_data.cormatrix_rounded<- round(revised_cereal_data.cormatrix , digits=3)
print(revised_cereal_data.cormatrix)

##                Filling     Natural       Fibre       Sweet         Easy
## Filling     1.00000000  0.53968982  0.55200307  0.19040004  0.236529426
## Natural     0.53968982  1.00000000  0.65228983 -0.09094192  0.230948116
## Fibre       0.55200307  0.65228983  1.00000000 -0.03739830  0.171721940
## Sweet       0.19040004 -0.09094192 -0.03739830  1.00000000  0.131211352
## Easy        0.23652943  0.23094812  0.17172194  0.13121135  1.000000000
## Salt       -0.03626646 -0.21687200 -0.17488799  0.44399118  0.026936813
## Satisfying  0.64850720  0.46324724  0.40804658  0.18477668  0.362309849
## Energy      0.63675882  0.49354159  0.50373069  0.18496133  0.182630082
## Fun         0.26521397  0.08190321  0.06273083  0.32722313  0.248246218
## Kids        0.16000741  0.06029259 -0.09483085  0.12324275  0.251967889
## Soggy      -0.05988555  0.06839973 -0.04166740 -0.08375192 -0.008804016
## Economical  0.05194244  0.10316137 -0.03396533 -0.23981376  0.092549751
## Health      0.54706871  0.68809770  0.68398369 -0.11562213  0.204021617
## Family      0.23287605  0.10708173 -0.01039870  0.04343192  0.235175664
## Calories    0.04721422 -0.16167366 -0.18654202  0.46731243 -0.015414590
## Plain      -0.25064803 -0.13851302 -0.12284590 -0.28955897  0.017651539
## Crisp       0.12650057  0.02152927  0.04832402  0.26276634  0.252223659
## Regular     0.42049880  0.41763842  0.64837568 -0.02518025  0.105246439
## Sugar      -0.07851945 -0.31680448 -0.22556725  0.64838267 -0.005731638
## Fruit       0.26116604  0.30015027  0.29314106  0.34650542  0.037197487
## Process    -0.23171929 -0.30454423 -0.19522957  0.11851882 -0.048325631
## Quality     0.44321697  0.57909956  0.51319376 -0.07754712  0.171421388
## Treat       0.33764173  0.16939794  0.13967658  0.37627476  0.198897222
## Boring     -0.17785084 -0.21758679 -0.09925867 -0.20033406 -0.167536447
##                     Salt    Satisfying      Energy          Fun
## Filling    -3.626646e-02  6.485072e-01  0.63675882  0.265213973
## Natural    -2.168720e-01  4.632472e-01  0.49354159  0.081903207
## Fibre      -1.748880e-01  4.080466e-01  0.50373069  0.062730827
## Sweet       4.439912e-01  1.847767e-01  0.18496133  0.327223134
## Easy        2.693681e-02  3.623098e-01  0.18263008  0.248246218
## Salt        1.000000e+00  5.371153e-05 -0.06713581  0.033474536
## Satisfying  5.371153e-05  1.000000e+00  0.59966696  0.354852245
## Energy     -6.713581e-02  5.996670e-01  1.00000000  0.350327368
## Fun         3.347454e-02  3.548522e-01  0.35032737  1.000000000
## Kids        3.298221e-02  3.121943e-01  0.13057221  0.349965961
## Soggy       2.359707e-02 -1.425032e-02 -0.04592438 -0.098754958
## Economical -1.259049e-01  2.137707e-01  0.02641362  0.040700477
## Health     -2.283768e-01  5.181843e-01  0.52424330  0.100955593
## Family     -7.965423e-02  3.546910e-01  0.19136018  0.352476715
## Calories    4.380975e-01  1.219290e-02  0.03362541  0.113449859
## Plain       2.137203e-02 -1.795016e-01 -0.25577344 -0.322275476
## Crisp       1.033722e-01  2.762034e-01  0.24829752  0.402670109
## Regular    -1.645302e-01  3.282587e-01  0.38571918  0.136731512
## Sugar       5.917709e-01 -8.119525e-02 -0.08606954  0.165290744
## Fruit       2.557426e-02  2.538439e-01  0.27438372  0.251421273
## Process     3.048367e-01 -1.649989e-01 -0.10157043 -0.002202497
## Quality    -2.178523e-01  4.747221e-01  0.45703627  0.224503157
## Treat       1.278963e-01  3.821680e-01  0.32246211  0.586497175
## Boring      1.122315e-01 -3.156052e-01 -0.22338882 -0.298063613
##                   Kids        Soggy  Economical       Health      Family
## Filling     0.16000741 -0.059885555  0.05194244  0.547068708  0.23287605
## Natural     0.06029259  0.068399728  0.10316137  0.688097695  0.10708173
## Fibre      -0.09483085 -0.041667404 -0.03396533  0.683983690 -0.01039870
## Sweet       0.12324275 -0.083751919 -0.23981376 -0.115622126  0.04343192
## Easy        0.25196789 -0.008804016  0.09254975  0.204021617  0.23517566
## Salt        0.03298221  0.023597066 -0.12590486 -0.228376777 -0.07965423
## Satisfying  0.31219433 -0.014250315  0.21377074  0.518184332  0.35469098
## Energy      0.13057221 -0.045924383  0.02641362  0.524243298  0.19136018
## Fun         0.34996596 -0.098754958  0.04070048  0.100955593  0.35247672
## Kids        1.00000000  0.087658404  0.30491206 -0.012760939  0.72701108
## Soggy       0.08765840  1.000000000  0.11715122  0.006146656  0.08138269
## Economical  0.30491206  0.117151217  1.00000000  0.192658638  0.23335601
## Health     -0.01276094  0.006146656  0.19265864  1.000000000  0.08211150
## Family      0.72701108  0.081382689  0.23335601  0.082111496  1.00000000
## Calories    0.01435653 -0.079664961 -0.21047144 -0.307176155 -0.06072469
## Plain       0.02921320  0.346129827  0.23120114 -0.099609317 -0.02899279
## Crisp       0.30194580 -0.335632858  0.09269221  0.082416385  0.28807642
## Regular    -0.02583127 -0.137300090  0.08029354  0.543222577  0.04374372
## Sugar      -0.01589207 -0.094456381 -0.29255416 -0.376892968 -0.05448568
## Fruit      -0.23212140 -0.137035700 -0.33848391  0.266341061 -0.12389300
## Process     0.02697130  0.058660437 -0.12542062 -0.289470005 -0.01293971
## Quality     0.11634764 -0.029808614  0.21549364  0.686304848  0.24107378
## Treat       0.28528848 -0.254809103 -0.03971178  0.214279460  0.30077242
## Boring     -0.19372777  0.226885825 -0.02137835 -0.228589063 -0.24764977
##               Calories       Plain       Crisp     Regular        Sugar
## Filling     0.04721422 -0.25064803  0.12650057  0.42049880 -0.078519448
## Natural    -0.16167366 -0.13851302  0.02152927  0.41763842 -0.316804483
## Fibre      -0.18654202 -0.12284590  0.04832402  0.64837568 -0.225567250
## Sweet       0.46731243 -0.28955897  0.26276634 -0.02518025  0.648382667
## Easy       -0.01541459  0.01765154  0.25222366  0.10524644 -0.005731638
## Salt        0.43809745  0.02137203  0.10337222 -0.16453021  0.591770895
## Satisfying  0.01219290 -0.17950163  0.27620343  0.32825868 -0.081195255
## Energy      0.03362541 -0.25577344  0.24829752  0.38571918 -0.086069535
## Fun         0.11344986 -0.32227548  0.40267011  0.13673151  0.165290744
## Kids        0.01435653  0.02921320  0.30194580 -0.02583127 -0.015892067
## Soggy      -0.07966496  0.34612983 -0.33563286 -0.13730009 -0.094456381
## Economical -0.21047144  0.23120114  0.09269221  0.08029354 -0.292554157
## Health     -0.30717616 -0.09960932  0.08241638  0.54322258 -0.376892968
## Family     -0.06072469 -0.02899279  0.28807642  0.04374372 -0.054485684
## Calories    1.00000000 -0.07619086  0.14705033 -0.16490865  0.525826174
## Plain      -0.07619086  1.00000000 -0.20966191 -0.08026008 -0.146856923
## Crisp       0.14705033 -0.20966191  1.00000000  0.13330386  0.168930691
## Regular    -0.16490865 -0.08026008  0.13330386  1.00000000 -0.090571672
## Sugar       0.52582617 -0.14685692  0.16893069 -0.09057167  1.000000000
## Fruit       0.12593786 -0.34308629  0.09022960  0.25474509  0.145330476
## Process     0.27389198  0.11318637  0.02484550 -0.14912239  0.369253388
## Quality    -0.20135015 -0.22690816  0.13432982  0.44147633 -0.263389434
## Treat       0.19292930 -0.42989349  0.46808561  0.16655163  0.217100889
## Boring     -0.02701569  0.33052554 -0.32358917 -0.09469787 -0.000921067
##                  Fruit      Process     Quality       Treat       Boring
## Filling     0.26116604 -0.231719289  0.44321697  0.33764173 -0.177850835
## Natural     0.30015027 -0.304544226  0.57909956  0.16939794 -0.217586787
## Fibre       0.29314106 -0.195229572  0.51319376  0.13967658 -0.099258673
## Sweet       0.34650542  0.118518821 -0.07754712  0.37627476 -0.200334059
## Easy        0.03719749 -0.048325631  0.17142139  0.19889722 -0.167536447
## Salt        0.02557426  0.304836682 -0.21785225  0.12789628  0.112231476
## Satisfying  0.25384390 -0.164998880  0.47472211  0.38216799 -0.315605190
## Energy      0.27438372 -0.101570429  0.45703627  0.32246211 -0.223388817
## Fun         0.25142127 -0.002202497  0.22450316  0.58649717 -0.298063613
## Kids       -0.23212140  0.026971295  0.11634764  0.28528848 -0.193727771
## Soggy      -0.13703570  0.058660437 -0.02980861 -0.25480910  0.226885825
## Economical -0.33848391 -0.125420621  0.21549364 -0.03971178 -0.021378353
## Health      0.26634106 -0.289470005  0.68630485  0.21427946 -0.228589063
## Family     -0.12389300 -0.012939711  0.24107378  0.30077242 -0.247649766
## Calories    0.12593786  0.273891978 -0.20135015  0.19292930 -0.027015687
## Plain      -0.34308629  0.113186375 -0.22690816 -0.42989349  0.330525544
## Crisp       0.09022960  0.024845497  0.13432982  0.46808561 -0.323589173
## Regular     0.25474509 -0.149122387  0.44147633  0.16655163 -0.094697872
## Sugar       0.14533048  0.369253388 -0.26338943  0.21710089 -0.000921067
## Fruit       1.00000000 -0.140204367  0.16460384  0.31255476 -0.260061699
## Process    -0.14020437  1.000000000 -0.18304941  0.03109584  0.171709754
## Quality     0.16460384 -0.183049413  1.00000000  0.33407508 -0.284256014
## Treat       0.31255476  0.031095839  0.33407508  1.00000000 -0.359339636
## Boring     -0.26006170  0.171709754 -0.28425601 -0.35933964  1.000000000

corrplot(revised_cereal_data.cormatrix, method="shade", type="full", addCoef.col = "blue", order ="AOE", bg ='grey')

Bartlett’s test of sphericitty is required to test the p-value. Our Nulll hypothesis is that there is no factors existing and hence FCA can’t be done. Our p-value threshold is 0.05. p-value came is 0 and so our Null hypothesis is rejcted and we can proceed for FCA

revised_cereal_data.barttest <- cortest.bartlett(revised_cereal_data.cormatrix, n=nrow(cereal_data))
print(revised_cereal_data.barttest)

## $chisq
## [1] 2613.019
## 
## $p.value
## [1] 0
## 
## $df
## [1] 276

We need to do the KMO test now to see the result. Overall MSA to be 0.84 which shows a higher degree of common variance. All the variable individual MSA value is also > 0.5. This confirms us that FCA can be done on this data

For reference, Kaiser put the following values on the results: – 0.00 to 0.49 unacceptable. – 0.50 to 0.59 miserable. – 0.60 to 0.69 mediocre. – 0.70 to 0.79 middling. – 0.80 to 0.89 meritorious. – 0.90 to 1.00 marvelous.

KMO(revised_cereal_data.cormatrix_rounded)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = revised_cereal_data.cormatrix_rounded)
## Overall MSA =  0.84
## MSA for each item = 
##    Filling    Natural      Fibre      Sweet       Easy       Salt 
##       0.88       0.88       0.87       0.79       0.83       0.82 
## Satisfying     Energy        Fun       Kids      Soggy Economical 
##       0.90       0.90       0.85       0.68       0.64       0.72 
##     Health     Family   Calories      Plain      Crisp    Regular 
##       0.89       0.73       0.85       0.81       0.83       0.83 
##      Sugar      Fruit    Process    Quality      Treat     Boring 
##       0.78       0.77       0.80       0.90       0.87       0.88

First step for our FCA analysis is to find out optimal number of Factor. So we wil use the physch package and fa.parallel function to find out the factor. Thumb rule is the factor corresponding to eigen value 1 should be considered. Another approximation rule is to find out number of factor where elbow happens. Factor method is assumed as Principal Axis one. This will draw the scree plot. Scree Plot shows the number of factor as 4.

revised_cereal_data.fca <- fa.parallel(revised_cereal_data, fm = 'pa', fa = 'fa')

## Parallel analysis suggests that the number of factors =  4  and the number of components =  NA

FCA is now done with number if factor as 4. Thumb rule is that, first time FCA is done without rotation and see that if variables segregation makes sense for factors. It is clear from “Cumulative Proportion” that PA1, PA2 and PA3 combingly is able to explain 91% of common variance…Hence our final selected factors are PA1, PA2 and PA3.

Now we should validate the output of FCA… The root mean square of residuals (RMSR) is 0.04. This is acceptable as this value should be closer to 0. RMSEA (Root Mean Square Error Approximation) index = 0.07 which is > 0.05 and this may pose some. Value of “Tucker Lewis Index” value is 0.876 which is very near to 0.9 which is good.

revised_cereal_data.factor <- fa(revised_cereal_data,nfactors = 4,rotate = "none",fm="pa")
print(revised_cereal_data.factor)

## Factor Analysis using method =  pa
## Call: fa(r = revised_cereal_data, nfactors = 4, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##              PA1   PA2   PA3   PA4   h2   u2 com
## Filling     0.73  0.04 -0.09  0.22 0.60 0.40 1.2
## Natural     0.71 -0.29 -0.14  0.13 0.62 0.38 1.5
## Fibre       0.68 -0.28 -0.33  0.19 0.69 0.31 2.0
## Sweet       0.13  0.74 -0.18  0.14 0.62 0.38 1.3
## Easy        0.33  0.11  0.19  0.11 0.17 0.83 2.1
## Salt       -0.19  0.53 -0.11  0.39 0.48 0.52 2.2
## Satisfying  0.74  0.12  0.14  0.19 0.62 0.38 1.3
## Energy      0.70  0.08 -0.09  0.14 0.53 0.47 1.1
## Fun         0.44  0.45  0.20 -0.17 0.46 0.54 2.7
## Kids        0.26  0.24  0.77  0.10 0.73 0.27 1.5
## Soggy      -0.12 -0.21  0.16  0.39 0.24 0.76 2.1
## Economical  0.14 -0.25  0.46  0.09 0.31 0.69 1.9
## Health      0.78 -0.36 -0.14  0.10 0.77 0.23 1.5
## Family      0.35  0.17  0.67  0.02 0.59 0.41 1.7
## Calories   -0.12  0.58 -0.15  0.21 0.42 0.58 1.5
## Plain      -0.34 -0.33  0.23  0.42 0.46 0.54 3.5
## Crisp       0.33  0.41  0.21 -0.19 0.36 0.64 2.9
## Regular     0.56 -0.16 -0.20  0.08 0.38 0.62 1.5
## Sugar      -0.21  0.77 -0.22  0.23 0.73 0.27 1.5
## Fruit       0.38  0.23 -0.48 -0.14 0.44 0.56 2.6
## Process    -0.28  0.29  0.03  0.21 0.21 0.79 2.8
## Quality     0.70 -0.19  0.02 -0.02 0.53 0.47 1.1
## Treat       0.51  0.52  0.07 -0.22 0.59 0.41 2.4
## Boring     -0.41 -0.22 -0.08  0.33 0.34 0.66 2.6
## 
##                        PA1  PA2  PA3  PA4
## SS loadings           5.47 3.30 2.03 1.10
## Proportion Var        0.23 0.14 0.08 0.05
## Cumulative Var        0.23 0.37 0.45 0.50
## Proportion Explained  0.46 0.28 0.17 0.09
## Cumulative Proportion 0.46 0.74 0.91 1.00
## 
## Mean item complexity =  1.9
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  11.6 with Chi Square of  2613.02
## The degrees of freedom for the model are 186  and the objective function was  1.7 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.05 
## 
## The harmonic number of observations is  235 with the empirical chi square  199.55  with prob <  0.24 
## The total number of observations was  235  with Likelihood Chi Square =  378.03  with prob <  3.5e-15 
## 
## Tucker Lewis Index of factoring reliability =  0.876
## RMSEA index =  0.07  and the 90 % confidence intervals are  0.057 0.076
## BIC =  -637.46
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    PA1  PA2  PA3  PA4
## Correlation of (regression) scores with factors   0.97 0.94 0.92 0.83
## Multiple R square of scores with factors          0.93 0.89 0.84 0.68
## Minimum correlation of possible factor scores     0.87 0.78 0.68 0.37

print(revised_cereal_data.factor$loadings,cutoff = 0.5)  ## Here we see that very few variable has double loading

## 
## Loadings:
##            PA1    PA2    PA3    PA4   
## Filling     0.733                     
## Natural     0.706                     
## Fibre       0.683                     
## Sweet              0.742              
## Easy                                  
## Salt               0.535              
## Satisfying  0.741                     
## Energy      0.705                     
## Fun                                   
## Kids                      0.768       
## Soggy                                 
## Economical                            
## Health      0.775                     
## Family                    0.665       
## Calories           0.583              
## Plain                                 
## Crisp                                 
## Regular     0.555                     
## Sugar              0.766              
## Fruit                                 
## Process                               
## Quality     0.704                     
## Treat       0.515  0.525              
## Boring                                
## 
##                  PA1   PA2   PA3   PA4
## SS loadings    5.466 3.304 2.028 1.095
## Proportion Var 0.228 0.138 0.085 0.046
## Cumulative Var 0.228 0.365 0.450 0.496

We will try the FCA with rotation. The root mean square of residuals (RMSR) is 0.04. This is acceptable as this value should be closer to 0. RMSEA (Root Mean Square Error Approximation) index = 0.07 which is > 0.05 and this may pose some. Value of “Tucker Lewis Index” is very near to 0.9 which is good. So it is very clear that there is no change due to rotation.

revised_cereal_data.factor_rotate <- fa(revised_cereal_data,nfactors = 4,rotate = "verimax",fm="pa")

## Specified rotation not found, rotate='none' used

print(revised_cereal_data.factor_rotate)

## Factor Analysis using method =  pa
## Call: fa(r = revised_cereal_data, nfactors = 4, rotate = "verimax", 
##     fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##              PA1   PA2   PA3   PA4   h2   u2 com
## Filling     0.73  0.04 -0.09  0.22 0.60 0.40 1.2
## Natural     0.71 -0.29 -0.14  0.13 0.62 0.38 1.5
## Fibre       0.68 -0.28 -0.33  0.19 0.69 0.31 2.0
## Sweet       0.13  0.74 -0.18  0.14 0.62 0.38 1.3
## Easy        0.33  0.11  0.19  0.11 0.17 0.83 2.1
## Salt       -0.19  0.53 -0.11  0.39 0.48 0.52 2.2
## Satisfying  0.74  0.12  0.14  0.19 0.62 0.38 1.3
## Energy      0.70  0.08 -0.09  0.14 0.53 0.47 1.1
## Fun         0.44  0.45  0.20 -0.17 0.46 0.54 2.7
## Kids        0.26  0.24  0.77  0.10 0.73 0.27 1.5
## Soggy      -0.12 -0.21  0.16  0.39 0.24 0.76 2.1
## Economical  0.14 -0.25  0.46  0.09 0.31 0.69 1.9
## Health      0.78 -0.36 -0.14  0.10 0.77 0.23 1.5
## Family      0.35  0.17  0.67  0.02 0.59 0.41 1.7
## Calories   -0.12  0.58 -0.15  0.21 0.42 0.58 1.5
## Plain      -0.34 -0.33  0.23  0.42 0.46 0.54 3.5
## Crisp       0.33  0.41  0.21 -0.19 0.36 0.64 2.9
## Regular     0.56 -0.16 -0.20  0.08 0.38 0.62 1.5
## Sugar      -0.21  0.77 -0.22  0.23 0.73 0.27 1.5
## Fruit       0.38  0.23 -0.48 -0.14 0.44 0.56 2.6
## Process    -0.28  0.29  0.03  0.21 0.21 0.79 2.8
## Quality     0.70 -0.19  0.02 -0.02 0.53 0.47 1.1
## Treat       0.51  0.52  0.07 -0.22 0.59 0.41 2.4
## Boring     -0.41 -0.22 -0.08  0.33 0.34 0.66 2.6
## 
##                        PA1  PA2  PA3  PA4
## SS loadings           5.47 3.30 2.03 1.10
## Proportion Var        0.23 0.14 0.08 0.05
## Cumulative Var        0.23 0.37 0.45 0.50
## Proportion Explained  0.46 0.28 0.17 0.09
## Cumulative Proportion 0.46 0.74 0.91 1.00
## 
## Mean item complexity =  1.9
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  276  and the objective function was  11.6 with Chi Square of  2613.02
## The degrees of freedom for the model are 186  and the objective function was  1.7 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.05 
## 
## The harmonic number of observations is  235 with the empirical chi square  199.55  with prob <  0.24 
## The total number of observations was  235  with Likelihood Chi Square =  378.03  with prob <  3.5e-15 
## 
## Tucker Lewis Index of factoring reliability =  0.876
## RMSEA index =  0.07  and the 90 % confidence intervals are  0.057 0.076
## BIC =  -637.46
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    PA1  PA2  PA3  PA4
## Correlation of (regression) scores with factors   0.97 0.94 0.92 0.83
## Multiple R square of scores with factors          0.93 0.89 0.84 0.68
## Minimum correlation of possible factor scores     0.87 0.78 0.68 0.37

print(revised_cereal_data.factor$loadings_rotate,cutoff = 0.5)  ## Here we see that very few variable has double loading

## NULL

Our selected factors are PA1, PA2 and PA3. Now we need to do the factor mapping. So our final factor mapping are (with 0.5 as loading cut-off) ## PA1 : Health, Satisfying, Filling, Natural,Energy, Quality, Fibre and Regular (Latest Factor Name can be: Healthy) ## PA2: Sugar, Sweet, Calories, Salt and Treat (Latest Factor Name can be: Not-Calorie Conscious) ## PA3: Kids and Family (Latest Factor Name can be: Family Oriented)

fa.diagram(revised_cereal_data.factor,cut = 0.5)

We have already identified latent factors and our variables are PA1, PA2 and PA3

myfactorColumnno1 <- c(fmatch("Health",names(cereal_data)), fmatch("Satisfying",names(cereal_data)), fmatch("Filling",names(cereal_data)),  fmatch("Natural",names(cereal_data)), fmatch("Filling",names(cereal_data)), fmatch("Energy",names(cereal_data)), fmatch("Quality",names(cereal_data)), fmatch("Fibre",names(cereal_data)), fmatch("Regular",names(cereal_data)))

myfactorColumnno2 <- c(fmatch("Sugar",names(cereal_data)), fmatch("Sweet",names(cereal_data)), fmatch("Calories",names(cereal_data)), fmatch("Salt",names(cereal_data)), fmatch("Treat",names(cereal_data)))

myfactorColumnno3 <- c(fmatch("Kids",names(cereal_data)), fmatch("Family",names(cereal_data)))

cereal_data$Healthy <- apply(cereal_data[,myfactorColumnno1],1,mean)
cereal_data$Non_Caclorie_Conscious <- apply(cereal_data[,myfactorColumnno2],1,mean)
cereal_data$Family_Oriented <- apply(cereal_data[,myfactorColumnno3],1,mean)

str(cereal_data)

## 'data.frame':    235 obs. of  29 variables:
##  $ Cereals               : Factor w/ 12 levels "AllBran","CMuesli",..: 12 9 9 2 3 8 9 9 8 3 ...
##  $ Filling               : int  5 1 5 5 4 4 4 4 4 4 ...
##  $ Natural               : int  5 2 4 5 5 4 4 3 3 3 ...
##  $ Fibre                 : int  5 2 5 5 3 4 3 3 3 3 ...
##  $ Sweet                 : int  1 1 5 3 2 2 2 2 2 2 ...
##  $ Easy                  : int  2 5 5 5 5 5 5 5 5 5 ...
##  $ Salt                  : int  1 2 3 2 2 2 1 1 1 1 ...
##  $ Satisfying            : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Energy                : int  4 1 5 5 4 4 5 4 4 4 ...
##  $ Fun                   : int  1 1 5 5 5 5 5 4 4 4 ...
##  $ Kids                  : int  4 5 5 5 5 5 5 5 5 5 ...
##  $ Soggy                 : int  5 3 3 3 1 1 1 1 1 1 ...
##  $ Economical            : int  5 5 3 3 5 5 5 3 3 3 ...
##  $ Health                : int  5 2 5 5 5 4 5 4 4 4 ...
##  $ Family                : int  5 5 5 5 3 5 5 5 5 5 ...
##  $ Calories              : int  1 1 1 1 3 3 3 2 2 2 ...
##  $ Plain                 : int  3 5 1 1 1 1 1 3 3 3 ...
##  $ Crisp                 : int  1 5 5 1 5 5 5 4 4 4 ...
##  $ Regular               : int  4 1 4 4 3 3 3 4 4 4 ...
##  $ Sugar                 : int  1 2 3 2 1 2 2 1 1 1 ...
##  $ Fruit                 : int  1 1 1 5 1 1 1 1 1 1 ...
##  $ Process               : int  3 5 2 2 3 3 3 2 2 2 ...
##  $ Quality               : int  5 2 5 5 5 5 5 4 4 4 ...
##  $ Treat                 : int  1 1 4 5 5 5 5 2 2 2 ...
##  $ Boring                : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Nutritious            : int  5 3 5 5 4 4 4 3 3 3 ...
##  $ Healthy               : num  4.78 1.89 4.78 4.89 4.22 ...
##  $ Non_Caclorie_Conscious: num  1 1.4 3.2 2.6 2.6 2.8 2.6 1.6 1.6 1.6 ...
##  $ Family_Oriented       : num  4.5 5 5 5 4 5 5 5 5 5 ...

### Now our aim is to create a table with mean value for all three factors based on cereal

myresult <- aggregate(cbind(cereal_data$Healthy,cereal_data$Non_Caclorie_Conscious,cereal_data$Family_Oriented), by = list(cereal_data$Cereals), FUN = mean)

colnames(myresult)[2:4] <-c("Healthy", "Non Caclorie Conscious", "Family Oriented")

myresult$Healthy  <- ceiling(myresult$Healthy)
myresult$`Non Caclorie Conscious` <- ceiling(myresult$`Non Caclorie Conscious`)
myresult$`Family Oriented` <-   ceiling(myresult$`Family Oriented`)

## Now we should interpret the brand survey outcome...Lets define the range 
## Value : (0-1.999 : No), (2-3: May be) and (4 -5 : Yes); 

## recoding the data

myresult$Healthy <- recode(myresult$Healthy, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresult$`Non Caclorie Conscious` <- recode(myresult$`Non Caclorie Conscious`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresult$`Family Oriented` <- recode(myresult$`Family Oriented`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresult

##        Group.1 Healthy Non Caclorie Conscious Family Oriented
## 1      AllBran     Yes                     No              No
## 2      CMuesli     Yes                     No             Yes
## 3   CornFlakes     Yes                     No             Yes
## 4    JustRight     Yes                     No             Yes
## 5     Komplete     Yes                     No              No
## 6   NutriGrain     Yes                    Yes             Yes
## 7      PMuesli     Yes                     No             Yes
## 8  RiceBubbles      No                     No             Yes
## 9     SpecialK     Yes                     No             Yes
## 10     Sustain     Yes                     No             Yes
## 11    Vitabrit     Yes                     No             Yes
## 12    Weetabix     Yes                     No             Yes

Factor score also can be used to understand and interpret the nature

revised_cereal_data.factorscore <-  revised_cereal_data.factor$scores
mk <- as.data.frame(revised_cereal_data.factorscore )

mydata <- as.data.frame(cereal_data$Cereals)
# k <- rbind.fill(as.data.frame(cereal_data$Cereals),revised_cereal_data.factorscore)

mkwithscore <-cbind(mydata,revised_cereal_data.factorscore)

myresultmk <- aggregate(cbind(abs(mkwithscore$PA1),abs(mkwithscore$PA2),abs(mkwithscore$PA3)), by = list(cereal_data$Cereals), FUN = mean)

myresultmk$Healthy <- recode(myresult$Healthy, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresultmk$`Non Caclorie Conscious` <- recode(myresult$`Non Caclorie Conscious`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")
myresultmk$`Family Oriented` <- recode(myresult$`Family Oriented`, "1='No'; 2='No'; 3='No'; 4='Yes'; 5='Yes'")

myresultmk$HealthyDerived[myresultmk$V1 > 0.75] = 'Yes' #if Median is higher, comparison = 1
myresultmk$HealthyDerived[myresultmk$V1 < 0.75] = 'No' #if Median is higher, comparison = 1

myresultmk$NonHealthyDerived[myresultmk$V2 > 0.75] = 'Yes' #if Median is higher, comparison = 1
myresultmk$NonHealthyDerived[myresultmk$V2 < 0.75] = 'No' #if Median is higher, comparison = 1

myresultmk$FamilyDerived[myresultmk$V3 > 0.75] = 'Yes' #if Median is higher, comparison = 1
myresultmk$FamilyDerived[myresultmk$V3 < 0.75] = 'No' #if Median is higher, comparison = 1
myresultmk

##        Group.1        V1        V2        V3 Healthy
## 1      AllBran 0.6749282 0.7349573 1.0559282     Yes
## 2      CMuesli 0.9974637 0.7478035 0.6941258     Yes
## 3   CornFlakes 0.9150257 0.7442646 0.7249563     Yes
## 4    JustRight 0.7102331 0.7527087 0.6056755     Yes
## 5     Komplete 0.7004994 0.5258570 1.2313606     Yes
## 6   NutriGrain 0.8452656 1.1056192 0.5889176     Yes
## 7      PMuesli 0.7108973 0.8574756 0.9349777     Yes
## 8  RiceBubbles 0.9838677 0.6355220 1.0075413      No
## 9     SpecialK 0.7234387 0.4538966 0.6083918     Yes
## 10     Sustain 0.8404615 0.5869119 0.7631423     Yes
## 11    Vitabrit 0.5985314 0.9789122 0.4783065     Yes
## 12    Weetabix 0.4592862 0.8696765 0.5457632     Yes
##    Non Caclorie Conscious Family Oriented HealthyDerived NonHealthyDerived
## 1                      No              No             No                No
## 2                      No             Yes            Yes                No
## 3                      No             Yes            Yes                No
## 4                      No             Yes             No               Yes
## 5                      No              No             No                No
## 6                     Yes             Yes            Yes               Yes
## 7                      No             Yes             No               Yes
## 8                      No             Yes            Yes                No
## 9                      No             Yes             No                No
## 10                     No             Yes            Yes                No
## 11                     No             Yes             No               Yes
## 12                     No             Yes             No               Yes
##    FamilyDerived
## 1            Yes
## 2             No
## 3             No
## 4             No
## 5            Yes
## 6             No
## 7            Yes
## 8            Yes
## 9             No
## 10           Yes
## 11            No
## 12            No

Cereal_Brand_Analysis

Amit Kayal

November 19, 2017