INTRODUCTION.
Food companies as well others such as personal care, or even automobile manufactures combine sensory evaluation and consumer research to aid manufacturing and marketing advance the company’s objectives. The size and complexity the databases varies widely. Many statistical techniques are used to relate the sensory descriptive data to consumer responses such as overall liking (in a 9-point hedonic scale) or purchase intensions. The techniques include factor analysis, Partial least squares regression, multiple linear regression, and Principal Components Analysis (PCA). Such databases rely on information from two different panels: 1) a trained panel capable of describing all or the most relevant sensory characteristics of products and a consumer panel which assess the products according to their personal liking and preferences.
One of the most important uses of PCA is exploring complex multivariate datasets in order to find multiple insights. For example, for this study a dataset containing sensory descriptive data (37 attributes) and consumer overall liking scores (80 consumers), all evaluating a set of 11 white corn tortilla chips was obtained from the database of The Sensometric Society. Principal Components Analysis along with Factor Analysis offer tools to find which sensory descriptors are correlated to the actual products and consumers. Moreover it is possible to appreciate which descriptors influence liking (drivers of liking) as opposed to those which do not drive liking and should receive less consideration in product development.
The first objective of this project (Part 1) is to apply PCA to quickly explore the drivers of liking and interesting relationships of the white corn tortilla chips category in a semi-unsupervised manner, serving as an additional tool to those regression models that could also be employed. Part 2 deals with the exploration of potential prodcut clusters product using the pam algorithm. Clusters analysis is usually used to segment consumers but in this project it is used to aid PCA separating products.
MATERIALS AND METHODS
Dataset 1:
library(readr)
chips <- read_csv("C:/Users/kennethmariano/Dropbox/2017/7142/Final Project/chips.csv")
## Parsed with column specification:
## cols(
## .default = col_integer(),
## ProductID = col_character(),
## sweet = col_double(),
## salt = col_double(),
## sour = col_double(),
## lime = col_double(),
## astringent = col_double(),
## graincomplex = col_double(),
## toastedcorn = col_double(),
## rawcorn = col_double(),
## masa = col_double(),
## toastedgrain = col_double(),
## painty = col_double(),
## feedy = col_double(),
## heatedoil = col_double(),
## scorched = col_double(),
## cardboard = col_double(),
## sourgrain = col_double(),
## microrough = col_double(),
## macrorough = col_double(),
## oilygreasylip = col_double()
## # ... with 18 more columns
## )
## See spec(...) for full column specifications.
dim(chips)
## [1] 11 118
11 Rows : 11 products:
chips$ProductID
## [1] "BYW" "GMG" "GUY" "MED" "MST" "MTR" "OCF" "SAN" "TBS" "TOM" "TRS"
chips[,1:5]
## # A tibble: 11 x 5
## ProductID sweet salt sour lime
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 BYW 0.47 8.76 0.04 0.00
## 2 GMG 0.51 7.22 0.14 0.22
## 3 GUY 0.53 7.84 0.09 0.00
## 4 MED 0.44 6.88 0.10 0.00
## 5 MST 0.63 9.44 0.09 0.00
## 6 MTR 0.44 9.88 0.01 0.00
## 7 OCF 0.65 7.30 0.31 0.00
## 8 SAN 0.62 9.01 0.00 0.00
## 9 TBS 0.40 8.91 0.02 0.00
## 10 TOM 0.42 8.54 0.02 0.00
## 11 TRS 0.62 7.99 0.06 0.00
chips [,38:43]
## # A tibble: 11 x 6
## spots C1 C2 C3 C4 C5
## <dbl> <int> <int> <int> <int> <int>
## 1 10.5 8 4 6 7 6
## 2 5.0 3 3 6 7 6
## 3 13.0 6 6 7 6 6
## 4 10.5 3 4 3 4 8
## 5 8.0 9 4 7 8 2
## 6 8.0 4 7 7 9 7
## 7 11.0 4 8 7 8 6
## 8 10.5 7 8 9 4 8
## 9 9.5 8 4 9 7 6
## 10 12.0 7 9 6 8 6
## 11 10.0 2 6 9 7 8
Columns: Both sensory descriptors and “overall liking scores” for each consumer
head(chips, 1)
## # A tibble: 1 x 118
## ProductID sweet salt sour lime astringent graincomplex toastedcorn
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 BYW 0.47 8.76 0.04 0 2.55 6.9 2.72
## # ... with 110 more variables: rawcorn <dbl>, masa <dbl>,
## # toastedgrain <dbl>, painty <dbl>, feedy <dbl>, heatedoil <dbl>,
## # scorched <dbl>, cardboard <dbl>, sourgrain <dbl>, microrough <dbl>,
## # macrorough <dbl>, oilygreasylip <dbl>, looseparticles <dbl>,
## # hardness <dbl>, crispness <dbl>, fracturability <dbl>,
## # cohesivemass <dbl>, roughofmass <dbl>, moistofmass <dbl>,
## # moistabsorp <dbl>, persistcrisp <dbl>, toothpack <dbl>,
## # looseparticles1 <dbl>, oilygreasyfilm <dbl>, DegreeofWhiteness <dbl>,
## # GrainFlecks <dbl>, CharMarks <dbl>, MicroSurfaceParticles <dbl>,
## # AmountofBubbles <dbl>, spots <dbl>, C1 <int>, C2 <int>, C3 <int>,
## # C4 <int>, C5 <int>, C6 <int>, C7 <int>, C8 <int>, C9 <int>, C10 <int>,
## # C11 <int>, C12 <int>, C13 <int>, C14 <int>, C15 <int>, C16 <int>,
## # C17 <int>, C18 <int>, C19 <int>, C20 <int>, C21 <int>, C22 <int>,
## # C23 <int>, C24 <int>, C25 <int>, C26 <int>, C27 <int>, C28 <int>,
## # C29 <int>, C30 <int>, C31 <int>, C32 <int>, C33 <int>, C34 <int>,
## # C35 <int>, C36 <int>, C37 <int>, C38 <int>, C39 <int>, C40 <int>,
## # C41 <int>, C42 <int>, C43 <int>, C44 <int>, C45 <int>, C46 <int>,
## # C47 <int>, C48 <int>, C49 <int>, C50 <int>, C51 <int>, C52 <int>,
## # C53 <int>, C54 <int>, C55 <int>, C56 <int>, C57 <int>, C58 <int>,
## # C59 <int>, C60 <int>, C61 <int>, C62 <int>, C63 <int>, C64 <int>,
## # C65 <int>, C66 <int>, C67 <int>, C68 <int>, C69 <int>, C70 <int>, ...
DATA PROCESSING : Only the last two consumers had missing values and were ommited the data was converted into a matrix
r <- as.matrix(chips)
rownames(r) = (chips$ProductID)
r2 <- (r[,1:118])
#head(r2)
#summary(r2)
x
library(devtools)
install_github("ggbiplot", "vqv")
## Warning: Username parameter is deprecated. Please use vqv/ggbiplot
## Skipping install of 'ggbiplot' from a github remote, the SHA1 (7325e880) has not changed since last install.
## Use `force = TRUE` to force installation
library(ggbiplot)
## Loading required package: ggplot2
## Loading required package: plyr
## Loading required package: scales
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
## Loading required package: grid
After import, all the variables were factors and were converted into “numeric”
r3 <-as.data.frame(r2)
library(data.table)
fwrite(r3, "r6")
r3 <- fread ("r6", colClasses = "numeric")
#str(r3)
RESULTS AND DISCUSSION
PCA on the 37 descriptors only
chips_pca<- prcomp(r3[,2:38], scale=T)
pve <-(chips_pca$sdev)^2
pve <- pve/sum((chips_pca$sdev)^2)
plot (1:length(pve), pve, xlab ="Number of PCs",
ylab = "PVE", type = "b")
- Three of four principal components describe most of the variance of the 37 descriptors
PCA Biplot showing products and sensory decriptors
biplot(chips_pca, col = c("blue", "red"), choices=c(1,2), scale=0)
g <- ggbiplot(chips_pca, obs.scale = 1, var.scale = 1,
ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)
- Products are fairly different expecpt for two small pairs, a desired quality for further modeling if desired. - The 11 products are a good representation of the product category or space. - The product characteristics are spread across the product space.
Biplot showing correlation of consumer’s liking and sensory descriptors
c1 <- cor((r3[,39:116]), (r3[,2:38]))
c2 <- cor((r3[,2:38]),(r3[,39:116]))
dim(c1)
## [1] 78 37
chips_pcajudge<- prcomp(c1[,1:37], scale=T)
chips_pcajudge2<- prcomp(c2[,1:78], scale=T)
biplot(chips_pcajudge, col = c("blue", "red"), choices=c(1,2), scale=0)
chips_pcajudget<- prcomp(c1[1:78,], scale=T)
Correlation of consumers and descriptors another view
g <- ggbiplot(chips_pcajudget, obs.scale = 1, var.scale = 1,
ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)
ProductS and consumer liking
chips_pcajudgeproduct<- prcomp(r3[,39:116], scale=T)
biplot(chips_pcajudgeproduct, col = c("blue", "green"), choices=c(1,2), scale=0)
Products 9, 8, 6, are 11 are more associated with liking
Biplot of Correlation of descriptor and attributes (the inverted case)
g <- ggbiplot(chips_pcajudge2, obs.scale = 1, var.scale = 1,
ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)
Consumers are heavaly associated with certain attributes, shown by a region of the plot with high density of consumer " vectors“. But the products also play a factor. Because there evidence of consumers liking asociation with specific descriptors, I decided to plot descriptors and liking with the products
chips_pca2<- prcomp(r3[,2:116], scale=T)
pve <-(chips_pca2$sdev)^2
pve <- pve/sum((chips_pca2$sdev)^2)
plot (1:length(pve), pve, xlab ="Number of PCs",
ylab = "PVE", type = "b")
biplot(chips_pca2, col = c("blue", "green"), choices=c(1,2), scale=0)
g <- ggbiplot(chips_pca2, obs.scale = 1, var.scale = 1,
ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)
Consumers tend to correlate to the left side including: crispiness, fracturability, surface particles, saltiness, toasted corn flavor, roughness of mass,
descriptors driving less liking: sourgrain flavor, painty, hardness cohesiveness of mass, feeedy.
CONCLUSIONS (PART 1)
We could separate the descriptors wat were assosciated or dissasociated with the liking of the tortilla chips. Te biplot is limited to two dimensions, and more than two Principal components (PC) were needed given the low variance explained by the first two PC’s in several cases, but the most variance is allways explained by the first two PC.
Appendix OF PART 1
chips_pcajudgeproduct
## Standard deviations (1, .., p=11):
## [1] 4.313949e+00 3.590799e+00 3.100144e+00 2.968644e+00 2.643962e+00
## [6] 2.273684e+00 2.269728e+00 2.099977e+00 1.837996e+00 1.724031e+00
## [11] 1.765053e-15
##
## Rotation (n x k) = (78 x 11):
## PC1 PC2 PC3 PC4 PC5
## C1 -0.097836755 -0.0957684428 1.276612e-01 0.126721563 -0.1713686745
## C2 -0.059584014 0.0670050967 -6.791635e-03 -0.207474880 -0.0930879072
## C3 -0.179415392 -0.0468884633 1.026637e-01 -0.076214756 0.1404818245
## C4 -0.024400694 -0.1597265434 3.954129e-02 -0.169857459 -0.0147989689
## C5 -0.024018741 0.1977398301 -1.668875e-01 -0.079027611 0.1190296249
## C6 -0.128697054 -0.0088698665 -1.138576e-01 0.130265009 -0.0154137727
## C7 -0.067113719 -0.1417762146 8.432768e-02 0.164686959 0.0603990050
## C8 -0.154103225 -0.0588834249 1.833894e-03 -0.146675540 0.0804086058
## C9 0.045609014 -0.1486857658 3.084461e-02 -0.171479000 0.1546059456
## C10 -0.141633629 -0.1696225977 7.619432e-02 0.106649597 -0.0177246976
## C11 -0.073933989 -0.0945182810 2.713837e-02 0.008521640 -0.0170573945
## C12 -0.046180769 0.0340504261 -1.348716e-01 0.192399166 -0.1311556165
## C13 -0.162222229 -0.1256407293 -1.700747e-02 0.135560416 -0.0097002506
## C14 0.023464895 -0.1648524924 -1.356138e-01 -0.199067368 0.0161697276
## C15 -0.117138448 -0.1542820534 -8.095780e-02 -0.065153267 0.0338589200
## C16 -0.067324345 0.1207805134 1.800837e-01 0.104049069 -0.0808545427
## C17 -0.089641515 -0.1588305586 -1.710202e-02 -0.117398538 0.1019065971
## C18 -0.160891402 0.1438074556 4.440691e-03 0.031706634 -0.0707590123
## C19 -0.043254589 -0.1390365907 -1.512334e-01 -0.106521358 -0.0934317495
## C20 -0.049211102 -0.1610025116 7.865076e-02 0.002001799 0.2312085880
## C21 -0.108556112 -0.0248680085 -1.340482e-01 -0.091398262 -0.0416383388
## C22 -0.102081265 0.1401926450 -1.903442e-01 -0.037320764 0.0782209768
## C23 -0.114248311 -0.0648775638 -8.961040e-03 -0.122995784 0.0470134304
## C24 -0.172723158 0.0552838713 7.054330e-02 -0.013306737 0.0489741059
## C25 -0.137298454 -0.0181483658 -1.358536e-01 0.038880587 -0.1147966718
## C26 -0.112157696 0.0855178886 -1.552048e-03 -0.216486093 0.0580968233
## C27 0.064647452 0.0100889024 1.312774e-01 0.081834981 0.1645027756
## C28 0.013420439 0.2000816200 1.127556e-01 0.052253007 0.2040246864
## C29 -0.124667091 0.1754158796 5.519171e-02 -0.085695061 -0.0117234793
## C30 -0.130046953 0.0891866893 -1.638463e-01 -0.030697146 0.0462243027
## C31 -0.158061598 -0.1309886043 4.003588e-02 -0.009158586 -0.0415765274
## C32 -0.187559810 0.0612428347 -1.081196e-01 0.038369862 -0.1013811187
## C33 -0.090596596 0.1453739611 -8.517465e-02 0.008727334 0.2201822522
## C34 -0.078606664 0.1631970498 2.101435e-01 -0.063912561 0.0356405440
## C35 -0.132260018 0.2089213206 3.181631e-03 -0.019175689 -0.0321956764
## C36 -0.132887548 -0.0863660467 4.570115e-02 -0.133219837 0.0561983339
## C37 -0.037467825 -0.0205943414 -1.385635e-01 0.115769004 0.0139000427
## C38 -0.049972207 -0.1719674545 -6.867611e-02 0.106304877 0.0474733562
## C39 -0.010810633 0.1160143444 9.267973e-02 -0.131595103 0.1828784479
## C40 -0.171355098 -0.0145476565 1.530686e-01 0.024646480 0.0733725995
## C41 -0.115329119 -0.0891225226 -1.868762e-01 0.070370425 -0.0106434758
## C42 -0.131397833 0.0680454788 -9.995211e-02 0.194945268 0.1352565820
## C43 -0.156210445 0.0423482765 5.272824e-02 0.135977896 -0.1508326952
## C44 -0.126986710 0.0928982394 -8.093487e-03 0.182746639 0.0555943631
## C45 -0.093324139 0.0050312415 -1.930264e-02 0.059679802 -0.1557597966
## C46 -0.052937182 -0.2425034023 -1.220519e-01 -0.007791311 0.0094994237
## C47 -0.164077983 -0.0367791951 8.822508e-05 -0.130676812 0.0627375427
## C48 -0.122635052 0.1089335383 1.742140e-01 -0.009634586 -0.1176542545
## C49 -0.085776129 -0.0663101786 2.563305e-01 -0.035941303 0.0385090423
## C50 -0.145922314 -0.0656088945 1.821510e-01 -0.104136276 0.0082734857
## C51 -0.120724784 0.0668937033 1.930513e-01 -0.072183108 0.0514850768
## C52 -0.132544309 -0.1394817103 9.868060e-03 -0.146965559 0.0552783780
## C53 -0.040004892 -0.0973166203 7.140952e-02 -0.015225717 -0.0707680601
## C54 -0.032998581 0.0432664583 -1.066028e-01 -0.171627341 0.2592719923
## C55 -0.137244803 0.1338256616 -6.784176e-02 -0.142348846 0.1227038696
## C56 0.088841063 -0.1496560103 1.087585e-01 0.118710416 -0.0454468864
## C57 0.025517139 0.0410517640 -7.067986e-02 0.105124861 0.0692157118
## C58 0.017099202 0.0924748101 9.474657e-03 -0.099911932 -0.1710842235
## C59 0.057765272 0.0006373724 -2.232017e-01 -0.143200033 -0.0279873184
## C60 0.084850607 -0.0660488611 1.696930e-01 -0.033437886 0.2463970466
## C61 -0.127741959 0.1199295658 -3.667680e-02 -0.033144936 -0.0309423482
## C62 -0.018015963 0.1380835616 7.164449e-04 0.142723679 0.2602419638
## C63 -0.104832279 -0.1312437050 -1.906983e-01 0.047967424 -0.0165761437
## C64 -0.199955786 0.0420417951 -2.450818e-03 -0.100956853 0.0001409529
## C65 -0.150923678 0.1122467283 -4.372720e-02 0.086397698 -0.1291115213
## C66 -0.063662236 0.0928631997 -1.605194e-01 0.152207675 0.2117044319
## C67 -0.148556463 -0.0244833538 1.819379e-01 0.021537149 -0.0894717224
## C68 -0.005391266 -0.1873037908 6.894464e-04 0.133145091 0.1453306265
## C69 -0.159126843 0.0108104465 1.234046e-01 0.084953821 0.0158365970
## C70 -0.175627942 -0.0414491444 -2.344194e-02 0.177505904 -0.0008458169
## C71 0.036058090 0.0414489032 1.710002e-01 -0.151485579 -0.1010784380
## C72 0.005624086 -0.1955143322 1.203285e-01 0.093483704 0.1120754122
## C73 -0.197494291 0.0116860577 -4.131807e-03 -0.016936250 -0.0474161350
## C74 -0.198810004 -0.0461548583 4.455851e-02 0.025336669 -0.1030300528
## C75 -0.097764478 -0.0186129327 -1.556585e-01 0.075799211 0.2051961858
## C76 -0.055426044 -0.1557610994 -1.319003e-01 -0.163970575 -0.0580928658
## C77 -0.109379242 -0.1138995474 5.110095e-02 0.147458864 0.1778933240
## C78 -0.125649120 0.0006994273 -1.216137e-02 -0.172964987 -0.1784027101
## PC6 PC7 PC8 PC9 PC10
## C1 -0.010883371 -0.095165362 -0.179219004 0.040974118 -0.043318844
## C2 -0.102103224 0.097150870 -0.175243664 0.074714345 0.245307994
## C3 -0.013163594 -0.032299702 0.033921009 -0.023394468 0.152310687
## C4 0.040876010 0.196624351 -0.045668588 -0.045503593 -0.235713968
## C5 0.001147704 -0.086761787 0.013350142 0.053550200 0.068832390
## C6 -0.249670319 0.112440402 -0.005114131 0.024414383 0.092511687
## C7 0.050682522 -0.132576460 -0.182428163 -0.051119125 0.147560348
## C8 -0.203005096 -0.003747257 -0.050548073 -0.116047542 -0.053899615
## C9 0.115272435 0.073549986 -0.074947917 -0.188019036 0.005799097
## C10 0.072493186 -0.011264409 0.087962257 0.088376956 -0.055452720
## C11 0.070551887 0.351183420 0.088185936 0.147398608 -0.039138095
## C12 -0.007578001 -0.175435742 0.185614201 -0.062002331 -0.009773245
## C13 -0.080621802 -0.004683619 -0.125749684 -0.035252285 0.106452029
## C14 -0.056382257 0.054140433 0.058437326 0.047571598 0.138833431
## C15 -0.168094191 0.025386278 0.190198290 0.077136349 -0.009930280
## C16 0.077789410 -0.096562634 0.207488431 0.026689395 0.032726604
## C17 -0.189139265 0.019089115 0.147218058 0.076570832 0.093732984
## C18 -0.055803718 0.108220704 0.102997468 0.116972973 0.113242016
## C19 -0.177966201 -0.097806359 0.028006568 0.187278093 -0.006688631
## C20 0.178436971 0.013663032 -0.059841312 0.034674401 0.026585073
## C21 -0.257189126 0.020392746 -0.004803032 -0.222985712 -0.026635593
## C22 -0.093699998 -0.004531618 0.086381554 0.039677212 0.148652679
## C23 0.147283342 -0.067458765 0.084664176 0.207314824 0.284680549
## C24 0.077582671 0.218969858 -0.116564294 -0.016827847 -0.008861146
## C25 0.209707091 -0.058351452 0.103985021 0.067955337 0.134711555
## C26 0.147498036 -0.118195027 0.036469428 0.024036782 -0.119860690
## C27 -0.165420157 0.010863965 0.286550308 -0.014069229 0.027496729
## C28 -0.001077249 -0.061800413 0.007811697 0.033169397 0.081130490
## C29 -0.017091403 -0.118194467 -0.155144856 0.010372062 0.114646306
## C30 0.035950152 -0.183140645 0.163597960 -0.013256702 -0.033604098
## C31 0.042607241 0.176651994 0.110993982 0.107892362 -0.085625136
## C32 0.036730549 -0.019822099 0.089346033 0.112427988 -0.065832281
## C33 -0.011318459 0.070609664 -0.001759781 0.168535987 0.120480059
## C34 -0.054274572 0.035717086 -0.006521407 -0.014377462 -0.129203430
## C35 -0.115468985 0.021444343 -0.019322500 0.033330407 -0.090579355
## C36 0.191041896 -0.035960980 -0.109666718 -0.138158802 0.146473599
## C37 -0.017047694 0.279902669 0.115138337 -0.236627412 0.062698366
## C38 0.210680126 -0.137470988 -0.131346140 0.050331780 -0.005486451
## C39 -0.094594256 -0.083162586 -0.108061860 0.160953368 -0.212660598
## C40 -0.001385109 -0.077180904 0.089155228 -0.185487081 -0.015110270
## C41 0.065447045 -0.111531136 0.030128135 0.183460313 -0.149073252
## C42 0.049526146 -0.044952631 0.015138779 -0.091480332 0.051195850
## C43 -0.021547305 0.058082399 0.179258705 0.055647926 0.026586253
## C44 -0.055991532 0.074023845 -0.140162732 -0.170522935 0.119955242
## C45 -0.146807356 0.100253807 -0.313807272 0.058810507 -0.089627285
## C46 0.015725434 -0.014652245 0.092483194 -0.025184670 -0.031805995
## C47 0.206106821 -0.079443638 0.015274146 0.115754042 0.041515043
## C48 -0.131955929 -0.027937873 0.001234569 -0.079138304 -0.145158868
## C49 -0.084732891 0.091569815 0.033265526 0.086005229 -0.120224207
## C50 -0.002151938 -0.147753694 0.053308620 -0.038426544 0.033254910
## C51 0.079778753 -0.174266428 -0.068862823 -0.104359488 -0.021532173
## C52 -0.160815578 -0.033774138 0.020226633 0.141647358 -0.011659206
## C53 -0.090231909 -0.327286923 0.125311387 -0.024647605 -0.180486216
## C54 -0.041907725 0.070844735 0.041401873 -0.092518001 -0.123670741
## C55 0.083333626 -0.072802522 -0.055318541 -0.030806888 -0.059629426
## C56 0.024649489 -0.157846083 0.054528309 0.054664215 0.230060746
## C57 0.260270368 0.180317941 0.075634642 -0.131007347 -0.250391916
## C58 0.180312247 0.006722980 0.152796252 -0.260920799 0.173615779
## C59 -0.087509958 -0.155865744 -0.081857510 -0.151446616 0.019410750
## C60 -0.001005554 -0.010953774 0.123053494 -0.095095617 0.016450081
## C61 0.098486817 -0.030394290 -0.281482687 0.146609240 -0.039508894
## C62 -0.102784477 0.065089524 -0.014335391 0.062860739 -0.038702932
## C63 -0.021826219 -0.080033017 -0.033403327 0.042117613 -0.228583357
## C64 0.089767456 0.108365810 -0.025607067 0.095438757 0.050012697
## C65 0.178966606 0.009743019 -0.039779257 0.005074812 -0.115746689
## C66 -0.023013125 0.026466702 0.090064159 0.013912434 -0.043007752
## C67 -0.156744535 -0.094256872 0.002406211 -0.083747631 0.048111473
## C68 -0.056921349 -0.157652303 -0.133489210 -0.021908439 -0.078320129
## C69 -0.099892425 -0.062419452 0.152967202 -0.193220290 -0.070021394
## C70 -0.029678136 -0.012199455 -0.112808942 0.043456760 -0.133572019
## C71 0.089080207 0.007987386 0.198844799 0.187809948 -0.146838413
## C72 -0.077622879 0.079145708 -0.081628765 0.035300671 0.188435970
## C73 0.134916807 0.094559479 0.111278947 -0.117968275 0.064000422
## C74 0.001759030 0.096953296 -0.057031576 -0.124773804 0.087264049
## C75 0.011949590 -0.151185772 -0.016211480 -0.177070481 -0.069232678
## C76 0.087351477 0.007810613 0.010191311 -0.189486432 -0.117084696
## C77 0.082273430 0.150022022 -0.011267333 0.043368691 -0.069308313
## C78 -0.031363045 0.066740608 -0.010585837 -0.236797634 0.014922744
## PC11
## C1 -0.136986439
## C2 0.168061226
## C3 -0.203946733
## C4 0.110471897
## C5 -0.090275560
## C6 0.218049710
## C7 0.249435260
## C8 0.092464224
## C9 0.050190796
## C10 0.049361992
## C11 0.022108329
## C12 0.051508280
## C13 -0.090512589
## C14 -0.096193602
## C15 0.068000545
## C16 -0.074029294
## C17 0.055106641
## C18 -0.044681538
## C19 -0.103594332
## C20 0.092448329
## C21 -0.044290119
## C22 0.101407995
## C23 0.099310376
## C24 0.086225202
## C25 -0.011187472
## C26 -0.021995085
## C27 -0.267340631
## C28 0.166838565
## C29 -0.261425653
## C30 0.021496142
## C31 0.121487663
## C32 0.051462974
## C33 0.039124181
## C34 0.088160440
## C35 -0.010632411
## C36 0.045846282
## C37 0.049138285
## C38 -0.127139626
## C39 0.109936715
## C40 0.167302706
## C41 0.018245758
## C42 -0.036416897
## C43 -0.041152863
## C44 -0.064752064
## C45 -0.061403176
## C46 -0.018903000
## C47 -0.183328732
## C48 -0.011365516
## C49 -0.045410373
## C50 0.135999293
## C51 -0.093539266
## C52 -0.079960406
## C53 0.113393847
## C54 -0.068496654
## C55 -0.008960045
## C56 0.186790236
## C57 -0.016791785
## C58 0.122665357
## C59 0.111978768
## C60 -0.136685814
## C61 0.133467603
## C62 -0.035141226
## C63 -0.192998615
## C64 -0.036635430
## C65 -0.123287114
## C66 0.066659204
## C67 -0.096060431
## C68 -0.056395769
## C69 0.091157579
## C70 0.208449791
## C71 0.174198419
## C72 -0.135759803
## C73 -0.149278060
## C74 0.033372207
## C75 0.122306713
## C76 0.015642821
## C77 -0.048992971
## C78 -0.124546973
Principal component 1 is associated with texture and some additional flavors whereas PC2 deals with chip specific flavors.
PART 2: In this section each row represents a consumer evaluating one product, for time reduction, all the descriptive data were coupled with each consumer (repeated 78 times, since the two last consumers with missing data were eliminated). Besides the variable overall liking (Previosly denoted as one column per consumer, e.g. C1 to C78) measured with a 9 point hedonic scale (1-extremely dislike, 9- expremely like), other hedonic measurements were included e.g. texture, flavor, saltines liking (9 point hedonic scale) and purchase intent, measured on a 5 point scale (5 = very likely to buy). Each of this varables are now columns and each row is a product panelist*combination, with addedd descriptive terms for that product in the same row in addition to the hedonic variables. In the new setting each consumer is denoted by PanID. Product clusters and consumer clusters will be explored
DATA PREPARATION
library(readr)
chips3 <- read_csv("C:/Users/kennethmariano/Dropbox/2017/7142/Final Project/chips3.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## Product = col_character(),
## ProductID = col_character(),
## panID = col_integer(),
## Liking = col_integer(),
## Appearance = col_character(),
## Flavor = col_character(),
## Texture = col_character(),
## PurchaseIntent = col_character()
## )
## See spec(...) for full column specifications.
dim(chips3)
## [1] 858 45
class(chips3)
## [1] "tbl_df" "tbl" "data.frame"
chips3 <- as.matrix(chips3)
chips4 <-as.data.frame(chips3)
library(data.table)
fwrite(chips4, "r6")
chips4 <- fread ("r6", colClasses = "numeric")
#str(r3)
head(chips4
)
## Product ProductID panID Liking Appearance Flavor Texture PurchaseIntent
## 1: P542 BYW 1 8 6 7 7 4
## 2: P225 GMG 1 3 8 3 4 1
## 3: P331 GUY 1 6 4 5 8 3
## 4: P961 MED 1 3 4 2 4 1
## 5: P580 MST 1 9 9 9 9 5
## 6: P375 MTR 1 4 9 4 5 2
## sweet salt sour lime astringent graincomplex toastedcorn rawcorn masa
## 1: 0.47 8.76 0.04 0.00 2.55 6.90 2.72 0.00 3.71
## 2: 0.51 7.22 0.14 0.22 2.45 7.21 2.25 0.18 3.65
## 3: 0.53 7.84 0.09 0.00 2.52 6.62 1.62 0.00 3.79
## 4: 0.44 6.88 0.10 0.00 2.53 6.47 2.29 0.00 3.80
## 5: 0.63 9.44 0.09 0.00 2.65 7.08 3.82 0.00 3.12
## 6: 0.44 9.88 0.01 0.00 2.58 6.91 2.84 0.00 3.17
## toastedgrain painty feedy heatedoil scorched cardboard sourgrain
## 1: 1.42 0.00 0 4.39 0.00 2.44 0
## 2: 2.18 0.45 0 4.36 0.47 2.04 0
## 3: 2.24 0.00 0 4.47 0.62 2.75 0
## 4: 1.52 0.21 0 4.31 0.00 3.34 0
## 5: 1.24 0.00 0 4.51 2.74 2.68 0
## 6: 1.97 0.00 0 4.35 0.55 2.55 0
## microrough macrorough oilygreasylip looseparticles hardness crispness
## 1: 8.77 3.20 6.57 6.71 8.82 10.36
## 2: 7.57 3.14 4.92 4.43 9.55 9.56
## 3: 8.56 2.81 5.84 4.96 8.58 9.92
## 4: 8.27 4.41 6.52 6.01 8.95 10.59
## 5: 8.11 4.14 6.43 5.26 8.71 10.24
## 6: 8.74 4.31 5.90 5.81 8.75 10.19
## fracturability cohesivemass roughofmass moistofmass moistabsorp
## 1: 7.43 2.68 7.64 7.33 9.07
## 2: 7.21 3.33 7.20 7.63 9.19
## 3: 7.77 3.18 7.38 7.14 9.50
## 4: 7.87 3.08 7.45 7.19 9.12
## 5: 7.50 2.89 7.27 7.29 9.64
## 6: 7.71 2.99 7.49 6.88 9.73
## persistcrisp toothpack looseparticles1 oilygreasyfilm DegreeofWhiteness
## 1: 6.00 5.21 7.21 3.99 6.0
## 2: 5.18 5.21 7.22 3.78 2.0
## 3: 5.06 5.10 6.94 3.85 7.0
## 4: 5.53 5.21 6.96 4.29 6.5
## 5: 4.94 5.37 7.34 3.85 5.0
## 6: 5.29 4.81 7.04 3.94 3.0
## GrainFlecks CharMarks MicroSurfaceParticles AmountofBubbles spots
## 1: 8.0 2.0 2.0 6 10.5
## 2: 3.0 7.0 0.5 6 5.0
## 3: 6.0 7.0 1.0 6 13.0
## 4: 7.5 3.0 2.0 6 10.5
## 5: 4.0 5.0 2.5 5 8.0
## 6: 3.0 4.5 2.0 5 8.0
chips4$Appearance <- as.numeric(chips4$Appearance)
## Warning: NAs introduced by coercion
chips4$Flavor <- as.numeric(chips4$Flavor)
## Warning: NAs introduced by coercion
chips4$Texture <- as.numeric(chips4$Texture)
## Warning: NAs introduced by coercion
chips4$PurchaseIntent <- as.numeric(chips4$PurchaseIntent)
## Warning: NAs introduced by coercion
chips4$`Saltiness JAR` <- as.numeric(chips4$`Saltiness JAR`)
Data analysis
library(cluster)
library(fpc)
chips5<- chips4 ; chips5$ProductID <- NULL;chips5$Product <- NULL;chips5$panID <- NULL
kclusters <-pamk(chips5)
table(kclusters$pamobject
$clustering,chips4$ProductID)
##
## BYW GMG GUY MED MST MTR OCF SAN TBS TOM TRS
## 1 51 0 0 22 0 0 26 69 65 0 64
## 2 0 78 0 0 0 0 0 0 0 0 0
## 3 0 0 78 0 0 0 0 0 0 0 0
## 4 27 0 0 56 10 2 52 9 13 10 14
## 5 0 0 0 0 68 76 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 68 0
#par(mfrow=c(1,2));plot(kclusters$pamobject)
The pamk function selected 6 clusters. In cluster 6 the product TOM stands alone although it was also clasified in cluster 4. Cluster five mostly represents MST and STR. Clusters 2 and 3 are complete separations of GMG AND GUY, whereas the largest cluster = 1 represents the other most similar produts. The plots could not be created therefore the validity of this table is still in doubt because some arguments were missing and possibly affecting the table results.
kclustersx <-pam(chips5, 4)
table(kclustersx$clustering,chips4$ProductID)
##
## BYW GMG GUY MED MST MTR OCF SAN TBS TOM TRS
## 1 38 0 10 6 0 0 19 67 64 25 57
## 2 0 78 0 0 0 2 0 0 0 0 0
## 3 40 0 68 72 6 0 59 11 14 53 21
## 4 0 0 0 0 72 76 0 0 0 0 0
#par(mfrow=c(1,2));plot(kclustersx)
Trying only Four clusters offers us some information; for example, cluster 2 is mostly composed of product GMG, cluster 4 by MST and MTR and clusters 1 and 3 have a mixture of the rest of the products.
Conclusions: I approached this project from an unsupervised mindset but a degree of supervision was not avoided. The exploration of drivers of overall liking does involve a supervised phylosophy. Neverteless, valuable information was gained regarding the descriptors that influence liking. When considering all liking attributes and descriptors for clustering it appears that cluster 1 is the largest and there are seldom other clusters representing either one or two products, this trend was also appreciated with PCA. Because the lack of variance explained by the first two principal components, the principal componets 3 and 4 were also plotted but ommited due to a lack of clear information.