INTRODUCTION.

Food companies as well others such as personal care, or even automobile manufactures combine sensory evaluation and consumer research to aid manufacturing and marketing advance the company’s objectives. The size and complexity the databases varies widely. Many statistical techniques are used to relate the sensory descriptive data to consumer responses such as overall liking (in a 9-point hedonic scale) or purchase intensions. The techniques include factor analysis, Partial least squares regression, multiple linear regression, and Principal Components Analysis (PCA). Such databases rely on information from two different panels: 1) a trained panel capable of describing all or the most relevant sensory characteristics of products and a consumer panel which assess the products according to their personal liking and preferences.

One of the most important uses of PCA is exploring complex multivariate datasets in order to find multiple insights. For example, for this study a dataset containing sensory descriptive data (37 attributes) and consumer overall liking scores (80 consumers), all evaluating a set of 11 white corn tortilla chips was obtained from the database of The Sensometric Society. Principal Components Analysis along with Factor Analysis offer tools to find which sensory descriptors are correlated to the actual products and consumers. Moreover it is possible to appreciate which descriptors influence liking (drivers of liking) as opposed to those which do not drive liking and should receive less consideration in product development.

The first objective of this project (Part 1) is to apply PCA to quickly explore the drivers of liking and interesting relationships of the white corn tortilla chips category in a semi-unsupervised manner, serving as an additional tool to those regression models that could also be employed. Part 2 deals with the exploration of potential prodcut clusters product using the pam algorithm. Clusters analysis is usually used to segment consumers but in this project it is used to aid PCA separating products.

MATERIALS AND METHODS

Dataset 1:

library(readr)
chips <- read_csv("C:/Users/kennethmariano/Dropbox/2017/7142/Final Project/chips.csv")
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   ProductID = col_character(),
##   sweet = col_double(),
##   salt = col_double(),
##   sour = col_double(),
##   lime = col_double(),
##   astringent = col_double(),
##   graincomplex = col_double(),
##   toastedcorn = col_double(),
##   rawcorn = col_double(),
##   masa = col_double(),
##   toastedgrain = col_double(),
##   painty = col_double(),
##   feedy = col_double(),
##   heatedoil = col_double(),
##   scorched = col_double(),
##   cardboard = col_double(),
##   sourgrain = col_double(),
##   microrough = col_double(),
##   macrorough = col_double(),
##   oilygreasylip = col_double()
##   # ... with 18 more columns
## )
## See spec(...) for full column specifications.
dim(chips)
## [1]  11 118

11 Rows : 11 products:

chips$ProductID      
##  [1] "BYW" "GMG" "GUY" "MED" "MST" "MTR" "OCF" "SAN" "TBS" "TOM" "TRS"
 chips[,1:5]
## # A tibble: 11 x 5
##    ProductID sweet  salt  sour  lime
##        <chr> <dbl> <dbl> <dbl> <dbl>
##  1       BYW  0.47  8.76  0.04  0.00
##  2       GMG  0.51  7.22  0.14  0.22
##  3       GUY  0.53  7.84  0.09  0.00
##  4       MED  0.44  6.88  0.10  0.00
##  5       MST  0.63  9.44  0.09  0.00
##  6       MTR  0.44  9.88  0.01  0.00
##  7       OCF  0.65  7.30  0.31  0.00
##  8       SAN  0.62  9.01  0.00  0.00
##  9       TBS  0.40  8.91  0.02  0.00
## 10       TOM  0.42  8.54  0.02  0.00
## 11       TRS  0.62  7.99  0.06  0.00
chips [,38:43]
## # A tibble: 11 x 6
##    spots    C1    C2    C3    C4    C5
##    <dbl> <int> <int> <int> <int> <int>
##  1  10.5     8     4     6     7     6
##  2   5.0     3     3     6     7     6
##  3  13.0     6     6     7     6     6
##  4  10.5     3     4     3     4     8
##  5   8.0     9     4     7     8     2
##  6   8.0     4     7     7     9     7
##  7  11.0     4     8     7     8     6
##  8  10.5     7     8     9     4     8
##  9   9.5     8     4     9     7     6
## 10  12.0     7     9     6     8     6
## 11  10.0     2     6     9     7     8

Columns: Both sensory descriptors and “overall liking scores” for each consumer

head(chips, 1)
## # A tibble: 1 x 118
##   ProductID sweet  salt  sour  lime astringent graincomplex toastedcorn
##       <chr> <dbl> <dbl> <dbl> <dbl>      <dbl>        <dbl>       <dbl>
## 1       BYW  0.47  8.76  0.04     0       2.55          6.9        2.72
## # ... with 110 more variables: rawcorn <dbl>, masa <dbl>,
## #   toastedgrain <dbl>, painty <dbl>, feedy <dbl>, heatedoil <dbl>,
## #   scorched <dbl>, cardboard <dbl>, sourgrain <dbl>, microrough <dbl>,
## #   macrorough <dbl>, oilygreasylip <dbl>, looseparticles <dbl>,
## #   hardness <dbl>, crispness <dbl>, fracturability <dbl>,
## #   cohesivemass <dbl>, roughofmass <dbl>, moistofmass <dbl>,
## #   moistabsorp <dbl>, persistcrisp <dbl>, toothpack <dbl>,
## #   looseparticles1 <dbl>, oilygreasyfilm <dbl>, DegreeofWhiteness <dbl>,
## #   GrainFlecks <dbl>, CharMarks <dbl>, MicroSurfaceParticles <dbl>,
## #   AmountofBubbles <dbl>, spots <dbl>, C1 <int>, C2 <int>, C3 <int>,
## #   C4 <int>, C5 <int>, C6 <int>, C7 <int>, C8 <int>, C9 <int>, C10 <int>,
## #   C11 <int>, C12 <int>, C13 <int>, C14 <int>, C15 <int>, C16 <int>,
## #   C17 <int>, C18 <int>, C19 <int>, C20 <int>, C21 <int>, C22 <int>,
## #   C23 <int>, C24 <int>, C25 <int>, C26 <int>, C27 <int>, C28 <int>,
## #   C29 <int>, C30 <int>, C31 <int>, C32 <int>, C33 <int>, C34 <int>,
## #   C35 <int>, C36 <int>, C37 <int>, C38 <int>, C39 <int>, C40 <int>,
## #   C41 <int>, C42 <int>, C43 <int>, C44 <int>, C45 <int>, C46 <int>,
## #   C47 <int>, C48 <int>, C49 <int>, C50 <int>, C51 <int>, C52 <int>,
## #   C53 <int>, C54 <int>, C55 <int>, C56 <int>, C57 <int>, C58 <int>,
## #   C59 <int>, C60 <int>, C61 <int>, C62 <int>, C63 <int>, C64 <int>,
## #   C65 <int>, C66 <int>, C67 <int>, C68 <int>, C69 <int>, C70 <int>, ...

DATA PROCESSING : Only the last two consumers had missing values and were ommited the data was converted into a matrix

r <- as.matrix(chips)
rownames(r) = (chips$ProductID)
r2 <- (r[,1:118])
#head(r2)
#summary(r2)

x

library(devtools)
install_github("ggbiplot", "vqv")
## Warning: Username parameter is deprecated. Please use vqv/ggbiplot
## Skipping install of 'ggbiplot' from a github remote, the SHA1 (7325e880) has not changed since last install.
##   Use `force = TRUE` to force installation
library(ggbiplot)
## Loading required package: ggplot2
## Loading required package: plyr
## Loading required package: scales
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
## Loading required package: grid

After import, all the variables were factors and were converted into “numeric”

r3 <-as.data.frame(r2)
library(data.table)
fwrite(r3, "r6")
r3 <- fread ("r6", colClasses = "numeric")
#str(r3)

RESULTS AND DISCUSSION

PCA on the 37 descriptors only

chips_pca<- prcomp(r3[,2:38], scale=T)
pve <-(chips_pca$sdev)^2
pve <- pve/sum((chips_pca$sdev)^2)
plot (1:length(pve), pve, xlab ="Number of PCs",
      ylab = "PVE", type = "b")

- Three of four principal components describe most of the variance of the 37 descriptors

PCA Biplot showing products and sensory decriptors

biplot(chips_pca, col = c("blue", "red"), choices=c(1,2), scale=0)

g <- ggbiplot(chips_pca, obs.scale = 1, var.scale = 1, 
              ellipse = TRUE, 
              circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal', 
               legend.position = 'top')
print(g)

- Products are fairly different expecpt for two small pairs, a desired quality for further modeling if desired. - The 11 products are a good representation of the product category or space. - The product characteristics are spread across the product space.

Biplot showing correlation of consumer’s liking and sensory descriptors

 c1 <- cor((r3[,39:116]), (r3[,2:38]))
c2 <- cor((r3[,2:38]),(r3[,39:116]))
 dim(c1)
## [1] 78 37
 chips_pcajudge<- prcomp(c1[,1:37], scale=T)
 chips_pcajudge2<- prcomp(c2[,1:78], scale=T)
 biplot(chips_pcajudge, col = c("blue", "red"), choices=c(1,2), scale=0)

 chips_pcajudget<- prcomp(c1[1:78,], scale=T)

Correlation of consumers and descriptors another view

g <- ggbiplot(chips_pcajudget, obs.scale = 1, var.scale = 1, 
              ellipse = TRUE, 
              circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal', 
               legend.position = 'top')
print(g)

ProductS and consumer liking

chips_pcajudgeproduct<- prcomp(r3[,39:116], scale=T)
 biplot(chips_pcajudgeproduct, col = c("blue", "green"), choices=c(1,2), scale=0)

Products 9, 8, 6, are 11 are more associated with liking

Biplot of Correlation of descriptor and attributes (the inverted case)

g <- ggbiplot(chips_pcajudge2, obs.scale = 1, var.scale = 1, 
              ellipse = TRUE, 
              circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal', 
               legend.position = 'top')
print(g)

Consumers are heavaly associated with certain attributes, shown by a region of the plot with high density of consumer " vectors“. But the products also play a factor. Because there evidence of consumers liking asociation with specific descriptors, I decided to plot descriptors and liking with the products

chips_pca2<- prcomp(r3[,2:116], scale=T)
pve <-(chips_pca2$sdev)^2
pve <- pve/sum((chips_pca2$sdev)^2)
plot (1:length(pve), pve, xlab ="Number of PCs",
      ylab = "PVE", type = "b")

biplot(chips_pca2, col = c("blue", "green"), choices=c(1,2), scale=0)

g <- ggbiplot(chips_pca2, obs.scale = 1, var.scale = 1, 
              ellipse = TRUE, 
              circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal', 
               legend.position = 'top')
print(g)

Consumers tend to correlate to the left side including: crispiness, fracturability, surface particles, saltiness, toasted corn flavor, roughness of mass,

descriptors driving less liking: sourgrain flavor, painty, hardness cohesiveness of mass, feeedy.

CONCLUSIONS (PART 1)

We could separate the descriptors wat were assosciated or dissasociated with the liking of the tortilla chips. Te biplot is limited to two dimensions, and more than two Principal components (PC) were needed given the low variance explained by the first two PC’s in several cases, but the most variance is allways explained by the first two PC.

Appendix OF PART 1

chips_pcajudgeproduct
## Standard deviations (1, .., p=11):
##  [1] 4.313949e+00 3.590799e+00 3.100144e+00 2.968644e+00 2.643962e+00
##  [6] 2.273684e+00 2.269728e+00 2.099977e+00 1.837996e+00 1.724031e+00
## [11] 1.765053e-15
## 
## Rotation (n x k) = (78 x 11):
##              PC1           PC2           PC3          PC4           PC5
## C1  -0.097836755 -0.0957684428  1.276612e-01  0.126721563 -0.1713686745
## C2  -0.059584014  0.0670050967 -6.791635e-03 -0.207474880 -0.0930879072
## C3  -0.179415392 -0.0468884633  1.026637e-01 -0.076214756  0.1404818245
## C4  -0.024400694 -0.1597265434  3.954129e-02 -0.169857459 -0.0147989689
## C5  -0.024018741  0.1977398301 -1.668875e-01 -0.079027611  0.1190296249
## C6  -0.128697054 -0.0088698665 -1.138576e-01  0.130265009 -0.0154137727
## C7  -0.067113719 -0.1417762146  8.432768e-02  0.164686959  0.0603990050
## C8  -0.154103225 -0.0588834249  1.833894e-03 -0.146675540  0.0804086058
## C9   0.045609014 -0.1486857658  3.084461e-02 -0.171479000  0.1546059456
## C10 -0.141633629 -0.1696225977  7.619432e-02  0.106649597 -0.0177246976
## C11 -0.073933989 -0.0945182810  2.713837e-02  0.008521640 -0.0170573945
## C12 -0.046180769  0.0340504261 -1.348716e-01  0.192399166 -0.1311556165
## C13 -0.162222229 -0.1256407293 -1.700747e-02  0.135560416 -0.0097002506
## C14  0.023464895 -0.1648524924 -1.356138e-01 -0.199067368  0.0161697276
## C15 -0.117138448 -0.1542820534 -8.095780e-02 -0.065153267  0.0338589200
## C16 -0.067324345  0.1207805134  1.800837e-01  0.104049069 -0.0808545427
## C17 -0.089641515 -0.1588305586 -1.710202e-02 -0.117398538  0.1019065971
## C18 -0.160891402  0.1438074556  4.440691e-03  0.031706634 -0.0707590123
## C19 -0.043254589 -0.1390365907 -1.512334e-01 -0.106521358 -0.0934317495
## C20 -0.049211102 -0.1610025116  7.865076e-02  0.002001799  0.2312085880
## C21 -0.108556112 -0.0248680085 -1.340482e-01 -0.091398262 -0.0416383388
## C22 -0.102081265  0.1401926450 -1.903442e-01 -0.037320764  0.0782209768
## C23 -0.114248311 -0.0648775638 -8.961040e-03 -0.122995784  0.0470134304
## C24 -0.172723158  0.0552838713  7.054330e-02 -0.013306737  0.0489741059
## C25 -0.137298454 -0.0181483658 -1.358536e-01  0.038880587 -0.1147966718
## C26 -0.112157696  0.0855178886 -1.552048e-03 -0.216486093  0.0580968233
## C27  0.064647452  0.0100889024  1.312774e-01  0.081834981  0.1645027756
## C28  0.013420439  0.2000816200  1.127556e-01  0.052253007  0.2040246864
## C29 -0.124667091  0.1754158796  5.519171e-02 -0.085695061 -0.0117234793
## C30 -0.130046953  0.0891866893 -1.638463e-01 -0.030697146  0.0462243027
## C31 -0.158061598 -0.1309886043  4.003588e-02 -0.009158586 -0.0415765274
## C32 -0.187559810  0.0612428347 -1.081196e-01  0.038369862 -0.1013811187
## C33 -0.090596596  0.1453739611 -8.517465e-02  0.008727334  0.2201822522
## C34 -0.078606664  0.1631970498  2.101435e-01 -0.063912561  0.0356405440
## C35 -0.132260018  0.2089213206  3.181631e-03 -0.019175689 -0.0321956764
## C36 -0.132887548 -0.0863660467  4.570115e-02 -0.133219837  0.0561983339
## C37 -0.037467825 -0.0205943414 -1.385635e-01  0.115769004  0.0139000427
## C38 -0.049972207 -0.1719674545 -6.867611e-02  0.106304877  0.0474733562
## C39 -0.010810633  0.1160143444  9.267973e-02 -0.131595103  0.1828784479
## C40 -0.171355098 -0.0145476565  1.530686e-01  0.024646480  0.0733725995
## C41 -0.115329119 -0.0891225226 -1.868762e-01  0.070370425 -0.0106434758
## C42 -0.131397833  0.0680454788 -9.995211e-02  0.194945268  0.1352565820
## C43 -0.156210445  0.0423482765  5.272824e-02  0.135977896 -0.1508326952
## C44 -0.126986710  0.0928982394 -8.093487e-03  0.182746639  0.0555943631
## C45 -0.093324139  0.0050312415 -1.930264e-02  0.059679802 -0.1557597966
## C46 -0.052937182 -0.2425034023 -1.220519e-01 -0.007791311  0.0094994237
## C47 -0.164077983 -0.0367791951  8.822508e-05 -0.130676812  0.0627375427
## C48 -0.122635052  0.1089335383  1.742140e-01 -0.009634586 -0.1176542545
## C49 -0.085776129 -0.0663101786  2.563305e-01 -0.035941303  0.0385090423
## C50 -0.145922314 -0.0656088945  1.821510e-01 -0.104136276  0.0082734857
## C51 -0.120724784  0.0668937033  1.930513e-01 -0.072183108  0.0514850768
## C52 -0.132544309 -0.1394817103  9.868060e-03 -0.146965559  0.0552783780
## C53 -0.040004892 -0.0973166203  7.140952e-02 -0.015225717 -0.0707680601
## C54 -0.032998581  0.0432664583 -1.066028e-01 -0.171627341  0.2592719923
## C55 -0.137244803  0.1338256616 -6.784176e-02 -0.142348846  0.1227038696
## C56  0.088841063 -0.1496560103  1.087585e-01  0.118710416 -0.0454468864
## C57  0.025517139  0.0410517640 -7.067986e-02  0.105124861  0.0692157118
## C58  0.017099202  0.0924748101  9.474657e-03 -0.099911932 -0.1710842235
## C59  0.057765272  0.0006373724 -2.232017e-01 -0.143200033 -0.0279873184
## C60  0.084850607 -0.0660488611  1.696930e-01 -0.033437886  0.2463970466
## C61 -0.127741959  0.1199295658 -3.667680e-02 -0.033144936 -0.0309423482
## C62 -0.018015963  0.1380835616  7.164449e-04  0.142723679  0.2602419638
## C63 -0.104832279 -0.1312437050 -1.906983e-01  0.047967424 -0.0165761437
## C64 -0.199955786  0.0420417951 -2.450818e-03 -0.100956853  0.0001409529
## C65 -0.150923678  0.1122467283 -4.372720e-02  0.086397698 -0.1291115213
## C66 -0.063662236  0.0928631997 -1.605194e-01  0.152207675  0.2117044319
## C67 -0.148556463 -0.0244833538  1.819379e-01  0.021537149 -0.0894717224
## C68 -0.005391266 -0.1873037908  6.894464e-04  0.133145091  0.1453306265
## C69 -0.159126843  0.0108104465  1.234046e-01  0.084953821  0.0158365970
## C70 -0.175627942 -0.0414491444 -2.344194e-02  0.177505904 -0.0008458169
## C71  0.036058090  0.0414489032  1.710002e-01 -0.151485579 -0.1010784380
## C72  0.005624086 -0.1955143322  1.203285e-01  0.093483704  0.1120754122
## C73 -0.197494291  0.0116860577 -4.131807e-03 -0.016936250 -0.0474161350
## C74 -0.198810004 -0.0461548583  4.455851e-02  0.025336669 -0.1030300528
## C75 -0.097764478 -0.0186129327 -1.556585e-01  0.075799211  0.2051961858
## C76 -0.055426044 -0.1557610994 -1.319003e-01 -0.163970575 -0.0580928658
## C77 -0.109379242 -0.1138995474  5.110095e-02  0.147458864  0.1778933240
## C78 -0.125649120  0.0006994273 -1.216137e-02 -0.172964987 -0.1784027101
##              PC6          PC7          PC8          PC9         PC10
## C1  -0.010883371 -0.095165362 -0.179219004  0.040974118 -0.043318844
## C2  -0.102103224  0.097150870 -0.175243664  0.074714345  0.245307994
## C3  -0.013163594 -0.032299702  0.033921009 -0.023394468  0.152310687
## C4   0.040876010  0.196624351 -0.045668588 -0.045503593 -0.235713968
## C5   0.001147704 -0.086761787  0.013350142  0.053550200  0.068832390
## C6  -0.249670319  0.112440402 -0.005114131  0.024414383  0.092511687
## C7   0.050682522 -0.132576460 -0.182428163 -0.051119125  0.147560348
## C8  -0.203005096 -0.003747257 -0.050548073 -0.116047542 -0.053899615
## C9   0.115272435  0.073549986 -0.074947917 -0.188019036  0.005799097
## C10  0.072493186 -0.011264409  0.087962257  0.088376956 -0.055452720
## C11  0.070551887  0.351183420  0.088185936  0.147398608 -0.039138095
## C12 -0.007578001 -0.175435742  0.185614201 -0.062002331 -0.009773245
## C13 -0.080621802 -0.004683619 -0.125749684 -0.035252285  0.106452029
## C14 -0.056382257  0.054140433  0.058437326  0.047571598  0.138833431
## C15 -0.168094191  0.025386278  0.190198290  0.077136349 -0.009930280
## C16  0.077789410 -0.096562634  0.207488431  0.026689395  0.032726604
## C17 -0.189139265  0.019089115  0.147218058  0.076570832  0.093732984
## C18 -0.055803718  0.108220704  0.102997468  0.116972973  0.113242016
## C19 -0.177966201 -0.097806359  0.028006568  0.187278093 -0.006688631
## C20  0.178436971  0.013663032 -0.059841312  0.034674401  0.026585073
## C21 -0.257189126  0.020392746 -0.004803032 -0.222985712 -0.026635593
## C22 -0.093699998 -0.004531618  0.086381554  0.039677212  0.148652679
## C23  0.147283342 -0.067458765  0.084664176  0.207314824  0.284680549
## C24  0.077582671  0.218969858 -0.116564294 -0.016827847 -0.008861146
## C25  0.209707091 -0.058351452  0.103985021  0.067955337  0.134711555
## C26  0.147498036 -0.118195027  0.036469428  0.024036782 -0.119860690
## C27 -0.165420157  0.010863965  0.286550308 -0.014069229  0.027496729
## C28 -0.001077249 -0.061800413  0.007811697  0.033169397  0.081130490
## C29 -0.017091403 -0.118194467 -0.155144856  0.010372062  0.114646306
## C30  0.035950152 -0.183140645  0.163597960 -0.013256702 -0.033604098
## C31  0.042607241  0.176651994  0.110993982  0.107892362 -0.085625136
## C32  0.036730549 -0.019822099  0.089346033  0.112427988 -0.065832281
## C33 -0.011318459  0.070609664 -0.001759781  0.168535987  0.120480059
## C34 -0.054274572  0.035717086 -0.006521407 -0.014377462 -0.129203430
## C35 -0.115468985  0.021444343 -0.019322500  0.033330407 -0.090579355
## C36  0.191041896 -0.035960980 -0.109666718 -0.138158802  0.146473599
## C37 -0.017047694  0.279902669  0.115138337 -0.236627412  0.062698366
## C38  0.210680126 -0.137470988 -0.131346140  0.050331780 -0.005486451
## C39 -0.094594256 -0.083162586 -0.108061860  0.160953368 -0.212660598
## C40 -0.001385109 -0.077180904  0.089155228 -0.185487081 -0.015110270
## C41  0.065447045 -0.111531136  0.030128135  0.183460313 -0.149073252
## C42  0.049526146 -0.044952631  0.015138779 -0.091480332  0.051195850
## C43 -0.021547305  0.058082399  0.179258705  0.055647926  0.026586253
## C44 -0.055991532  0.074023845 -0.140162732 -0.170522935  0.119955242
## C45 -0.146807356  0.100253807 -0.313807272  0.058810507 -0.089627285
## C46  0.015725434 -0.014652245  0.092483194 -0.025184670 -0.031805995
## C47  0.206106821 -0.079443638  0.015274146  0.115754042  0.041515043
## C48 -0.131955929 -0.027937873  0.001234569 -0.079138304 -0.145158868
## C49 -0.084732891  0.091569815  0.033265526  0.086005229 -0.120224207
## C50 -0.002151938 -0.147753694  0.053308620 -0.038426544  0.033254910
## C51  0.079778753 -0.174266428 -0.068862823 -0.104359488 -0.021532173
## C52 -0.160815578 -0.033774138  0.020226633  0.141647358 -0.011659206
## C53 -0.090231909 -0.327286923  0.125311387 -0.024647605 -0.180486216
## C54 -0.041907725  0.070844735  0.041401873 -0.092518001 -0.123670741
## C55  0.083333626 -0.072802522 -0.055318541 -0.030806888 -0.059629426
## C56  0.024649489 -0.157846083  0.054528309  0.054664215  0.230060746
## C57  0.260270368  0.180317941  0.075634642 -0.131007347 -0.250391916
## C58  0.180312247  0.006722980  0.152796252 -0.260920799  0.173615779
## C59 -0.087509958 -0.155865744 -0.081857510 -0.151446616  0.019410750
## C60 -0.001005554 -0.010953774  0.123053494 -0.095095617  0.016450081
## C61  0.098486817 -0.030394290 -0.281482687  0.146609240 -0.039508894
## C62 -0.102784477  0.065089524 -0.014335391  0.062860739 -0.038702932
## C63 -0.021826219 -0.080033017 -0.033403327  0.042117613 -0.228583357
## C64  0.089767456  0.108365810 -0.025607067  0.095438757  0.050012697
## C65  0.178966606  0.009743019 -0.039779257  0.005074812 -0.115746689
## C66 -0.023013125  0.026466702  0.090064159  0.013912434 -0.043007752
## C67 -0.156744535 -0.094256872  0.002406211 -0.083747631  0.048111473
## C68 -0.056921349 -0.157652303 -0.133489210 -0.021908439 -0.078320129
## C69 -0.099892425 -0.062419452  0.152967202 -0.193220290 -0.070021394
## C70 -0.029678136 -0.012199455 -0.112808942  0.043456760 -0.133572019
## C71  0.089080207  0.007987386  0.198844799  0.187809948 -0.146838413
## C72 -0.077622879  0.079145708 -0.081628765  0.035300671  0.188435970
## C73  0.134916807  0.094559479  0.111278947 -0.117968275  0.064000422
## C74  0.001759030  0.096953296 -0.057031576 -0.124773804  0.087264049
## C75  0.011949590 -0.151185772 -0.016211480 -0.177070481 -0.069232678
## C76  0.087351477  0.007810613  0.010191311 -0.189486432 -0.117084696
## C77  0.082273430  0.150022022 -0.011267333  0.043368691 -0.069308313
## C78 -0.031363045  0.066740608 -0.010585837 -0.236797634  0.014922744
##             PC11
## C1  -0.136986439
## C2   0.168061226
## C3  -0.203946733
## C4   0.110471897
## C5  -0.090275560
## C6   0.218049710
## C7   0.249435260
## C8   0.092464224
## C9   0.050190796
## C10  0.049361992
## C11  0.022108329
## C12  0.051508280
## C13 -0.090512589
## C14 -0.096193602
## C15  0.068000545
## C16 -0.074029294
## C17  0.055106641
## C18 -0.044681538
## C19 -0.103594332
## C20  0.092448329
## C21 -0.044290119
## C22  0.101407995
## C23  0.099310376
## C24  0.086225202
## C25 -0.011187472
## C26 -0.021995085
## C27 -0.267340631
## C28  0.166838565
## C29 -0.261425653
## C30  0.021496142
## C31  0.121487663
## C32  0.051462974
## C33  0.039124181
## C34  0.088160440
## C35 -0.010632411
## C36  0.045846282
## C37  0.049138285
## C38 -0.127139626
## C39  0.109936715
## C40  0.167302706
## C41  0.018245758
## C42 -0.036416897
## C43 -0.041152863
## C44 -0.064752064
## C45 -0.061403176
## C46 -0.018903000
## C47 -0.183328732
## C48 -0.011365516
## C49 -0.045410373
## C50  0.135999293
## C51 -0.093539266
## C52 -0.079960406
## C53  0.113393847
## C54 -0.068496654
## C55 -0.008960045
## C56  0.186790236
## C57 -0.016791785
## C58  0.122665357
## C59  0.111978768
## C60 -0.136685814
## C61  0.133467603
## C62 -0.035141226
## C63 -0.192998615
## C64 -0.036635430
## C65 -0.123287114
## C66  0.066659204
## C67 -0.096060431
## C68 -0.056395769
## C69  0.091157579
## C70  0.208449791
## C71  0.174198419
## C72 -0.135759803
## C73 -0.149278060
## C74  0.033372207
## C75  0.122306713
## C76  0.015642821
## C77 -0.048992971
## C78 -0.124546973

Principal component 1 is associated with texture and some additional flavors whereas PC2 deals with chip specific flavors.

PART 2: In this section each row represents a consumer evaluating one product, for time reduction, all the descriptive data were coupled with each consumer (repeated 78 times, since the two last consumers with missing data were eliminated). Besides the variable overall liking (Previosly denoted as one column per consumer, e.g. C1 to C78) measured with a 9 point hedonic scale (1-extremely dislike, 9- expremely like), other hedonic measurements were included e.g. texture, flavor, saltines liking (9 point hedonic scale) and purchase intent, measured on a 5 point scale (5 = very likely to buy). Each of this varables are now columns and each row is a product panelist*combination, with addedd descriptive terms for that product in the same row in addition to the hedonic variables. In the new setting each consumer is denoted by PanID. Product clusters and consumer clusters will be explored

DATA PREPARATION

 library(readr)
chips3 <- read_csv("C:/Users/kennethmariano/Dropbox/2017/7142/Final Project/chips3.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Product = col_character(),
##   ProductID = col_character(),
##   panID = col_integer(),
##   Liking = col_integer(),
##   Appearance = col_character(),
##   Flavor = col_character(),
##   Texture = col_character(),
##   PurchaseIntent = col_character()
## )
## See spec(...) for full column specifications.
dim(chips3)
## [1] 858  45
class(chips3)
## [1] "tbl_df"     "tbl"        "data.frame"
chips3 <- as.matrix(chips3)
chips4 <-as.data.frame(chips3)
library(data.table)
fwrite(chips4, "r6")
chips4 <- fread ("r6", colClasses = "numeric")
#str(r3)
head(chips4
     )
##    Product ProductID panID Liking Appearance Flavor Texture PurchaseIntent
## 1:    P542       BYW     1      8          6      7       7              4
## 2:    P225       GMG     1      3          8      3       4              1
## 3:    P331       GUY     1      6          4      5       8              3
## 4:    P961       MED     1      3          4      2       4              1
## 5:    P580       MST     1      9          9      9       9              5
## 6:    P375       MTR     1      4          9      4       5              2
##    sweet salt sour lime astringent graincomplex toastedcorn rawcorn masa
## 1:  0.47 8.76 0.04 0.00       2.55         6.90        2.72    0.00 3.71
## 2:  0.51 7.22 0.14 0.22       2.45         7.21        2.25    0.18 3.65
## 3:  0.53 7.84 0.09 0.00       2.52         6.62        1.62    0.00 3.79
## 4:  0.44 6.88 0.10 0.00       2.53         6.47        2.29    0.00 3.80
## 5:  0.63 9.44 0.09 0.00       2.65         7.08        3.82    0.00 3.12
## 6:  0.44 9.88 0.01 0.00       2.58         6.91        2.84    0.00 3.17
##    toastedgrain painty feedy heatedoil scorched cardboard sourgrain
## 1:         1.42   0.00     0      4.39     0.00      2.44         0
## 2:         2.18   0.45     0      4.36     0.47      2.04         0
## 3:         2.24   0.00     0      4.47     0.62      2.75         0
## 4:         1.52   0.21     0      4.31     0.00      3.34         0
## 5:         1.24   0.00     0      4.51     2.74      2.68         0
## 6:         1.97   0.00     0      4.35     0.55      2.55         0
##    microrough macrorough oilygreasylip looseparticles hardness crispness
## 1:       8.77       3.20          6.57           6.71     8.82     10.36
## 2:       7.57       3.14          4.92           4.43     9.55      9.56
## 3:       8.56       2.81          5.84           4.96     8.58      9.92
## 4:       8.27       4.41          6.52           6.01     8.95     10.59
## 5:       8.11       4.14          6.43           5.26     8.71     10.24
## 6:       8.74       4.31          5.90           5.81     8.75     10.19
##    fracturability cohesivemass roughofmass moistofmass moistabsorp
## 1:           7.43         2.68        7.64        7.33        9.07
## 2:           7.21         3.33        7.20        7.63        9.19
## 3:           7.77         3.18        7.38        7.14        9.50
## 4:           7.87         3.08        7.45        7.19        9.12
## 5:           7.50         2.89        7.27        7.29        9.64
## 6:           7.71         2.99        7.49        6.88        9.73
##    persistcrisp toothpack looseparticles1 oilygreasyfilm DegreeofWhiteness
## 1:         6.00      5.21            7.21           3.99               6.0
## 2:         5.18      5.21            7.22           3.78               2.0
## 3:         5.06      5.10            6.94           3.85               7.0
## 4:         5.53      5.21            6.96           4.29               6.5
## 5:         4.94      5.37            7.34           3.85               5.0
## 6:         5.29      4.81            7.04           3.94               3.0
##    GrainFlecks CharMarks MicroSurfaceParticles AmountofBubbles spots
## 1:         8.0       2.0                   2.0               6  10.5
## 2:         3.0       7.0                   0.5               6   5.0
## 3:         6.0       7.0                   1.0               6  13.0
## 4:         7.5       3.0                   2.0               6  10.5
## 5:         4.0       5.0                   2.5               5   8.0
## 6:         3.0       4.5                   2.0               5   8.0
chips4$Appearance <- as.numeric(chips4$Appearance)
## Warning: NAs introduced by coercion
chips4$Flavor <- as.numeric(chips4$Flavor)
## Warning: NAs introduced by coercion
chips4$Texture <- as.numeric(chips4$Texture)
## Warning: NAs introduced by coercion
chips4$PurchaseIntent <- as.numeric(chips4$PurchaseIntent)
## Warning: NAs introduced by coercion
chips4$`Saltiness JAR` <- as.numeric(chips4$`Saltiness JAR`)

Data analysis

library(cluster)
library(fpc)
chips5<- chips4 ; chips5$ProductID <- NULL;chips5$Product <- NULL;chips5$panID <- NULL 
kclusters <-pamk(chips5)
table(kclusters$pamobject
      $clustering,chips4$ProductID)
##    
##     BYW GMG GUY MED MST MTR OCF SAN TBS TOM TRS
##   1  51   0   0  22   0   0  26  69  65   0  64
##   2   0  78   0   0   0   0   0   0   0   0   0
##   3   0   0  78   0   0   0   0   0   0   0   0
##   4  27   0   0  56  10   2  52   9  13  10  14
##   5   0   0   0   0  68  76   0   0   0   0   0
##   6   0   0   0   0   0   0   0   0   0  68   0
#par(mfrow=c(1,2));plot(kclusters$pamobject)

The pamk function selected 6 clusters. In cluster 6 the product TOM stands alone although it was also clasified in cluster 4. Cluster five mostly represents MST and STR. Clusters 2 and 3 are complete separations of GMG AND GUY, whereas the largest cluster = 1 represents the other most similar produts. The plots could not be created therefore the validity of this table is still in doubt because some arguments were missing and possibly affecting the table results.

kclustersx <-pam(chips5, 4)
table(kclustersx$clustering,chips4$ProductID)
##    
##     BYW GMG GUY MED MST MTR OCF SAN TBS TOM TRS
##   1  38   0  10   6   0   0  19  67  64  25  57
##   2   0  78   0   0   0   2   0   0   0   0   0
##   3  40   0  68  72   6   0  59  11  14  53  21
##   4   0   0   0   0  72  76   0   0   0   0   0
#par(mfrow=c(1,2));plot(kclustersx)

Trying only Four clusters offers us some information; for example, cluster 2 is mostly composed of product GMG, cluster 4 by MST and MTR and clusters 1 and 3 have a mixture of the rest of the products.

Conclusions: I approached this project from an unsupervised mindset but a degree of supervision was not avoided. The exploration of drivers of overall liking does involve a supervised phylosophy. Neverteless, valuable information was gained regarding the descriptors that influence liking. When considering all liking attributes and descriptors for clustering it appears that cluster 1 is the largest and there are seldom other clusters representing either one or two products, this trend was also appreciated with PCA. Because the lack of variance explained by the first two principal components, the principal componets 3 and 4 were also plotted but ommited due to a lack of clear information.