Breakfast Cereal Data from General Mills (1), Kellogg (2), and Quaker (3)
| ACCheerios |
G |
110 |
2 |
2 |
180 |
1.5 |
4 |
10 |
70 |
1 |
| Cheerios |
G |
110 |
6 |
2 |
290 |
2.0 |
13 |
1 |
105 |
1 |
| CocoaPuffs |
G |
110 |
1 |
1 |
180 |
0.0 |
7 |
13 |
55 |
1 |
| CountChocula |
G |
110 |
1 |
1 |
180 |
0.0 |
7 |
13 |
65 |
1 |
| GoldenGrahams |
G |
110 |
1 |
1 |
280 |
0.0 |
11 |
9 |
45 |
1 |
| HoneyNutCheerios |
G |
110 |
3 |
1 |
250 |
1.5 |
6 |
10 |
90 |
1 |
Assuming multivariate normal data with a common covariance matrix, equal costs, and equal priors:
- Classify the cereal brands according to manufacturer. Compute the estimated E(AER) using the holdout procedure.
- Interpret the coefficients of the discriminant functions. Does it appear as if some manufacturers are associated with more “nutritional” cereals (high protein, low fat, high fiber, low sugar, and so forth) than others?
- Plot the cereals in the two-dimensional discriminant space, using different plotting symbols to identify the three manufacturers.
- Calculate the Euclidean distances between pairs of cereal brands.
- Treating the distances calculated in (d) as measures of (dis)similarity, cluster the cereals using the single linkage and complete linkage hierarchical procedures. Construct dendrograms and compare the results.
- Use K-means clustering method. Cluster the cereals into K =2, 3, and 4 groups. Compare the results with those in part (e).
a. Classification according to manufacturer
set.seed(2634)
lda.class <- lda(Manufacturer ~ ., data = df, prior = c(1, 1, 1)/3, CV = T)
lda.class
$class
[1] Q G G G G G G G G G Q G K K G K G K K K K Q K K K K K G K G K K K G G
[36] K K G G G Q Q Q
Levels: G K Q
$posterior
G K Q
1 0.278140838 0.0288956035984 0.6929635579091
2 0.730719531 0.1254803009451 0.1438001677134
3 0.892517863 0.0147145785487 0.0927675588832
4 0.908670975 0.0058937254859 0.0854352999719
5 0.946025582 0.0235734557260 0.0304009621423
6 0.834693817 0.1242999797820 0.0410062035470
7 0.863002362 0.0832841238998 0.0537135145895
8 0.898741508 0.0181532056189 0.0831052868405
9 0.611557678 0.2686117274092 0.1198305950522
10 0.885693026 0.0296701260959 0.0846368481141
11 0.112184389 0.0002796064574 0.8875360047734
12 0.680369344 0.1867973002972 0.1328333558536
13 0.089987322 0.9099946320610 0.0000180460998
14 0.120934789 0.8165935252200 0.0624716857767
15 0.598145848 0.2015665085673 0.2002876437771
16 0.104519662 0.8519051487649 0.0435751892293
17 0.584428213 0.3637771292141 0.0517946577939
18 0.475810090 0.5231557032391 0.0010342064404
19 0.026391108 0.9707286073204 0.0028802847828
20 0.169418260 0.8262758439537 0.0043058955743
21 0.006311571 0.9914992170217 0.0021892120816
22 0.004888582 0.0000004913226 0.9951109271696
23 0.019651176 0.9795082197202 0.0008406046272
24 0.201829875 0.6770747585056 0.1210953667640
25 0.021316051 0.9778924032989 0.0007915460586
26 0.002358525 0.9912760296143 0.0063654458280
27 0.001825402 0.9981739808290 0.0000006170877
28 0.581189584 0.3172358724258 0.1015745430967
29 0.161801169 0.8377511253629 0.0004477055684
30 0.829202368 0.1232867606857 0.0475108714987
31 0.128971096 0.8686542862038 0.0023746174202
32 0.013130587 0.9833586139792 0.0035107993891
33 0.375295542 0.6218471964108 0.0028572616767
34 0.803257106 0.1946351789989 0.0021077147698
35 0.776510431 0.2191417976383 0.0043477710096
36 0.075498413 0.9237815938592 0.0007199928163
37 0.338119094 0.6603473130406 0.0015335934002
38 0.967229996 0.0178711802686 0.0148988233307
39 0.719405356 0.2741577481113 0.0064368955709
40 0.633077966 0.0348538927703 0.3320681413697
41 0.040617057 0.0003060884612 0.9590768549983
42 0.014304551 0.0001343497100 0.9855610994267
43 0.040795598 0.0173098622694 0.9418945392514
$terms
Manufacturer ~ Calories + Protein + Fat + Sodium + Fiber + Carbohydrates +
Sugar + Potassium
attr(,"variables")
list(Manufacturer, Calories, Protein, Fat, Sodium, Fiber, Carbohydrates,
Sugar, Potassium)
attr(,"factors")
Calories Protein Fat Sodium Fiber Carbohydrates Sugar
Manufacturer 0 0 0 0 0 0 0
Calories 1 0 0 0 0 0 0
Protein 0 1 0 0 0 0 0
Fat 0 0 1 0 0 0 0
Sodium 0 0 0 1 0 0 0
Fiber 0 0 0 0 1 0 0
Carbohydrates 0 0 0 0 0 1 0
Sugar 0 0 0 0 0 0 1
Potassium 0 0 0 0 0 0 0
Potassium
Manufacturer 0
Calories 0
Protein 0
Fat 0
Sodium 0
Fiber 0
Carbohydrates 0
Sugar 0
Potassium 1
attr(,"term.labels")
[1] "Calories" "Protein" "Fat" "Sodium"
[5] "Fiber" "Carbohydrates" "Sugar" "Potassium"
attr(,"order")
[1] 1 1 1 1 1 1 1 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(Manufacturer, Calories, Protein, Fat, Sodium, Fiber, Carbohydrates,
Sugar, Potassium)
attr(,"dataClasses")
Manufacturer Calories Protein Fat Sodium
"factor" "numeric" "numeric" "numeric" "numeric"
Fiber Carbohydrates Sugar Potassium
"numeric" "numeric" "numeric" "numeric"
$call
lda(formula = Manufacturer ~ ., data = df, prior = c(1, 1, 1)/3,
CV = T)
$xlevels
named list()
holdout <- lda.class$class
table(df$Manufacturer, holdout)
holdout
G K Q
G 12 3 2
K 4 15 1
Q 3 0 3
b. Interpretation of discriminant functions’ coefficients
banana <- lda(Manufacturer ~ ., data = df, prior = c(1, 1, 1)/3)
banana
Call:
lda(Manufacturer ~ ., data = df, prior = c(1, 1, 1)/3)
Prior probabilities of groups:
G K Q
0.3333333 0.3333333 0.3333333
Group means:
Calories Protein Fat Sodium Fiber Carbohydrates Sugar
G 110.5882 2.352941 1.235294 203.52941 1.294118 9.705882 8.117647
K 111.0000 2.600000 0.650000 185.50000 2.250000 12.050000 7.950000
Q 90.0000 2.333333 1.333333 98.33333 1.116667 5.500000 5.000000
Potassium
G 85.00000
K 91.75000
Q 58.33333
Coefficients of linear discriminants:
LD1 LD2
Calories -0.042394995 -0.022385273
Protein -0.192736441 0.043708862
Fat 1.030238269 0.230557790
Sodium -0.002097074 0.008271173
Fiber -0.938811912 -1.424824908
Carbohydrates -0.112847256 0.016979986
Sugar -0.103510958 0.070253284
Potassium 0.019779960 0.035706831
Proportion of trace:
LD1 LD2
0.8095 0.1905
c. Plotting in 2D discriminant space

d. Euclidean Distance between pairs
d <- dist(df, method = "euclidean")
d
1 2 3 4 5 6
2 123.252409
3 16.678017 129.078465
4 7.290833 125.094964 10.606602
5 109.609107 65.366276 106.763758 108.332821
6 77.260922 47.020275 83.119831 78.955090 57.679340
7 91.832899 76.183167 87.509285 90.043739 23.406730 55.644238
8 16.474412 128.938939 1.500000 10.712143 106.732141 83.065674
9 49.366297 77.005682 58.046318 51.907369 80.351571 34.242244
10 58.345951 130.470902 73.064398 63.153830 142.843993 93.132969
11 86.233694 164.331315 100.546165 91.156630 180.057022 128.694405
12 45.264846 121.091907 33.237780 40.831973 85.920021 79.918122
13 173.226459 173.352243 188.731953 178.310824 220.632613 164.820072
14 49.944031 96.286681 64.194042 54.734587 110.099387 58.933490
15 64.073444 180.860443 53.054218 60.018747 150.067485 135.578967
16 50.135130 96.280839 64.290357 54.847516 110.114713 59.057440
17 25.306249 107.084663 23.165168 23.165168 86.357542 62.220626
18 282.187936 234.355499 297.174191 287.162846 295.808215 248.353642
19 72.298211 192.981864 64.150214 69.211632 165.388180 147.138646
20 123.862449 75.232141 120.023435 122.344289 20.594295 74.179891
21 109.331634 230.924501 102.445717 106.747951 203.317793 185.276567
22 104.561136 169.954038 119.535037 109.720668 192.436028 138.408025
23 61.821568 108.990252 52.058861 58.181827 66.144728 72.337101
24 72.220366 192.900233 64.132675 69.195376 165.354165 147.108060
25 52.684497 128.331115 38.462644 47.611711 87.509285 87.124372
26 193.994281 307.927751 197.248574 194.953841 302.881165 265.648793
27 142.922728 105.943971 157.196135 147.599543 160.033200 107.295882
28 18.317512 136.112086 15.443445 15.443445 117.824021 91.033133
29 114.203355 168.638296 127.610246 118.466767 191.680398 140.156720
30 36.019526 127.341077 22.699119 31.068473 96.199012 83.713238
31 83.894465 85.381497 96.374270 87.823687 115.060853 62.859218
32 34.715360 130.081225 46.342475 38.374796 128.146888 88.304339
33 151.864351 72.109292 149.887458 151.009106 44.471901 89.726146
34 183.571597 167.281948 199.178312 188.737914 220.106792 165.109907
35 123.462165 74.578817 119.539742 121.869808 17.557050 73.478951
36 122.096913 243.894188 118.466767 120.351049 223.059129 198.701173
37 56.705654 82.908082 54.621424 55.641711 54.744863 43.991832
38 57.523093 106.521125 48.628695 54.104066 65.606783 67.393666
39 51.266766 99.283684 45.074938 48.674942 64.708964 58.406175
40 43.139671 149.492893 54.734587 46.993351 148.261172 106.828279
41 209.861338 328.392106 206.136787 208.578223 305.556664 284.281166
42 202.653414 319.691844 201.840593 202.397196 304.047077 276.202202
43 196.151714 308.085704 200.488033 197.662468 305.413574 266.461057
7 8 9 10 11 12
2
3
4
5
6
7
8 87.380633
9 69.048896 57.910491
10 129.913553 73.002782 69.731852
11 166.794503 100.479009 100.551759 50.356231
12 63.860199 32.897568 63.311729 98.517289 129.705556
13 217.533618 188.731953 157.924428 119.187484 117.729696 209.967855
14 98.510152 64.000977 30.224163 46.972399 72.142437 80.435533
15 128.908398 53.054218 110.099387 107.839725 122.726449 65.778036
16 98.470173 64.097582 30.279944 47.056150 72.305990 80.386566
17 67.433300 22.921060 39.771221 74.361658 107.143741 27.392974
18 300.174324 297.136332 249.881722 241.551819 232.401390 311.981971
19 144.534166 64.167749 120.126496 109.007024 118.477450 81.429417
20 34.007352 119.920286 94.678403 159.604687 194.896848 96.064432
21 182.013736 102.445717 157.135292 137.531474 138.546134 118.457271
22 180.642603 119.469138 113.614480 57.630558 23.841272 147.850854
23 43.757856 51.842309 64.830548 111.763841 145.539621 21.917459
24 144.448520 64.132675 120.051551 108.955409 118.410963 81.277303
25 66.475559 38.433384 73.131730 107.829292 138.326719 15.112081
26 283.343211 197.208646 233.598213 184.372669 154.686477 223.464762
27 161.359846 157.181821 110.471716 105.647557 120.888508 170.594915
28 97.897268 14.924812 62.794307 67.368622 92.034810 41.677332
29 180.997238 127.601430 123.036580 57.386028 69.116071 152.768207
30 75.456113 22.500000 65.495229 88.164087 120.888508 17.937391
31 109.340637 96.292523 60.271262 55.725050 96.614731 107.916634
32 111.323852 46.050244 54.373707 54.033844 64.379976 69.882938
33 64.769785 149.804873 116.445803 180.979367 216.398721 128.177611
34 219.284917 199.167015 161.021350 134.658944 131.569967 218.320865
35 32.293575 119.445594 95.311988 157.839812 195.248640 95.471200
36 201.954822 118.485759 168.528188 137.996943 131.377438 138.614123
37 36.109209 54.332311 40.430496 96.608909 132.328781 38.739515
38 46.099078 48.628695 62.596925 105.210058 141.812486 27.331301
39 45.903431 45.049972 52.703178 96.247240 133.597928 29.008619
40 131.299276 54.549290 74.593901 46.792427 49.240799 84.180609
41 284.407278 206.071286 252.285552 228.562423 206.322942 222.675493
42 283.575387 201.768122 243.263746 213.034988 184.543440 222.498596
43 286.272731 200.403845 234.632535 183.866106 152.063457 227.415262
13 14 15 16 17 18
2
3
4
5
6
7
8
9
10
11
12
13
14 135.095799
15 226.142101 111.419814
16 135.108290 1.060660 111.465241
17 183.549380 54.394393 73.934938 54.404733
18 142.523682 234.012019 340.840138 233.985576 286.403780
19 225.429590 117.637685 17.298844 117.709388 86.240217 342.453646
20 236.707890 124.364384 160.391318 124.332719 99.938731 305.770134
21 248.807908 151.514851 53.318618 151.548260 124.301046 368.528662
22 97.309429 84.200653 143.479528 84.327487 124.128965 217.005472
23 217.259350 88.245396 85.992005 88.200765 38.855502 313.526913
24 225.429590 117.532442 17.168285 117.604209 86.161912 342.424079
25 220.068455 91.253767 63.701452 91.284582 37.349699 322.411034
26 247.958666 212.452642 168.934899 212.465880 216.621502 362.810901
27 71.206566 97.730497 205.009451 97.759271 144.802452 149.741026
28 184.560966 62.883821 49.613506 62.874876 31.908071 295.073296
29 88.149731 98.943166 153.209416 98.937480 130.042301 227.305686
30 202.755395 78.280745 56.564123 78.302299 26.045633 310.956187
31 111.121555 52.317540 143.984374 52.242224 83.671232 219.035956
32 159.646093 39.771221 79.902284 39.728768 50.266788 262.956032
33 243.887269 144.806336 192.783557 144.786912 128.864755 300.382256
34 31.925695 140.268047 240.135899 140.288096 191.615827 113.891396
35 235.026594 124.874437 160.029685 124.833890 99.368632 307.071653
36 240.643668 157.671335 76.903348 157.624950 139.902645 361.666455
37 193.453483 67.408271 101.313869 67.416615 32.982950 286.248581
38 210.444173 85.808653 86.181495 85.880731 36.264652 310.297035
39 200.023749 75.916895 88.130585 75.998355 29.259614 299.917489
40 155.563090 55.742713 76.124076 55.853603 65.897648 267.902268
41 319.346794 240.851614 162.405742 240.872632 226.763092 425.906240
42 294.074608 227.970393 164.306497 228.017269 222.096263 398.586722
43 242.739884 212.476708 174.668833 212.537588 219.387218 356.461816
19 20 21 22 23 24
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 176.257979
21 38.812047 213.427388
22 139.092146 207.948912 157.935113
23 102.104726 75.194747 138.885744 162.609963
24 1.500000 176.187755 38.783050 139.019333 101.994485
25 79.972652 97.315209 116.817165 157.006369 24.325912 79.944512
26 152.617168 315.340887 128.322348 162.107603 245.232798 152.572933
27 209.385709 174.535813 240.758177 111.666915 171.794208 209.380336
28 58.615271 130.556022 95.241141 111.318799 62.019150 58.461526
29 150.370792 209.366903 170.162422 55.500000 165.741516 150.355828
30 71.056316 108.690501 108.793957 138.654697 36.171467 70.977109
31 150.389494 132.302211 183.714793 97.043161 110.863091 150.329638
32 83.886083 140.448389 115.431798 84.560629 85.578911 83.738432
33 208.311065 33.591294 245.924531 227.246287 107.786247 208.257053
34 240.636656 234.736821 266.065077 113.639232 223.583038 240.627305
35 175.941752 10.764525 213.161031 207.745939 74.450487 175.877798
36 61.142252 234.076910 32.380550 148.621163 159.994531 61.160649
37 115.392807 68.188159 153.510179 147.156804 29.182615 115.295273
38 101.535708 78.567646 139.229577 157.988528 18.155578 101.502463
39 102.659632 79.138328 140.836341 149.138778 23.116553 102.615788
40 75.202227 161.923593 102.703457 70.738957 102.418260 75.067470
41 148.708187 312.975638 115.558427 223.313681 242.564424 148.609808
42 149.342308 312.846208 119.859293 199.471803 243.208244 149.244347
43 158.482258 318.139515 136.214908 158.200905 249.074289 158.382847
25 26 27 28 29 30
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 226.864772
27 180.455673 272.754697
28 49.306440 185.588254 157.188979
29 161.728940 182.941589 109.294327 120.724376
30 22.070908 212.439403 167.829154 32.066337 140.836341
31 118.057719 239.389849 70.905747 96.723834 83.819001 103.053384
32 79.584860 181.062489 133.828248 38.579463 107.781028 65.974427
33 129.873496 344.461536 177.314340 160.296600 227.082859 139.830254
34 228.588878 268.600540 61.855679 196.041450 114.310433 213.285021
35 96.718018 315.535260 173.575992 130.128782 206.365877 107.110924
36 139.201293 99.221343 240.954353 108.773273 162.062488 128.164445
37 46.147860 248.877982 144.401783 64.097582 150.161163 47.196398
38 26.303517 244.357320 165.907730 60.930288 157.945798 32.726136
39 32.048791 241.546579 155.374950 57.000000 149.146321 32.657312
40 92.149335 159.243917 140.648498 44.585592 94.345111 75.441202
41 222.045603 104.951775 323.982638 197.262832 250.630206 216.637081
42 223.484899 75.768562 304.300427 191.909158 229.891822 215.413266
43 231.235813 15.913084 269.614913 188.791224 180.287552 216.284422
31 32 33 34 35 36
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32 86.383592
33 145.623659 166.462834
34 119.370222 168.110455 238.702744
35 129.278769 141.688479 35.273928 234.024037
36 188.484416 121.398105 265.773165 259.389427 233.766711
37 86.650159 76.800879 96.735464 197.965906 67.433300 171.141389
38 103.619496 86.109668 109.479450 217.794628 76.382262 160.089428
39 93.253686 79.138328 108.871484 207.184700 76.998377 160.110509
40 93.728998 25.894015 188.185746 167.976561 162.322595 104.764021
41 280.333507 201.743030 345.102702 335.079655 314.983928 103.238801
42 266.491792 190.417305 343.822447 309.798120 314.876762 101.080414
43 238.743013 182.415326 346.684837 263.275191 318.384125 108.337442
37 38 39 40 41 42
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 28.774989
39 21.685248 10.712143
40 95.712460 100.337555 93.884903
41 255.773093 246.130301 247.220296 188.111004
42 252.408152 245.768946 245.255734 174.561594 37.529988
43 251.351599 247.852681 244.596354 160.112230 114.247872 82.981933
e. Clustering with single/complete linkage hierarchical procedures
set.seed(2634)
# Single Linkage Hierarchical Clustering
clust.single <- hclust(d, method = "single")
clust.single
Call:
hclust(d = d, method = "single")
Cluster method : single
Distance : euclidean
Number of objects: 43
plot(clust.single, cex = 0.5)
groups <- cutree(clust.single, k = 3)
groups
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 3 3 3
rect.hclust(clust.single, k = 3, border = "red")

# Complete Linkage Hierarchical Clustering
cluster.complete <- hclust(d, method = "ave")
cluster.complete
Call:
hclust(d = d, method = "ave")
Cluster method : average
Distance : euclidean
Number of objects: 43
plot(cluster.complete, cex = 0.5)
groups <- cutree(cluster.complete, k = 3)
groups
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 3 3 3
rect.hclust(cluster.complete, k = 3, border = "red")

f. K-means Clustering
set.seed(2634)
clusters <- kmeans(df[, 2:9], centers = 3, nstart = 10)
clusters
K-means clustering with 3 clusters of sizes 7, 8, 28
Cluster means:
Calories Protein Fat Sodium Fiber Carbohydrates Sugar
1 117.1429 3.142857 1.428571 190.00 4.642857 9.571429 10.142857
2 92.5000 2.250000 0.500000 51.25 1.337500 7.375000 7.750000
3 110.0000 2.357143 1.000000 215.00 1.089286 11.178571 6.928571
Potassium
1 205.71429
2 49.37500
3 64.10714
Clustering vector:
[1] 3 3 3 3 3 3 3 3 3 3 1 3 1 3 3 3 3 1 2 3 2 1 3 2 3 2 1 3 1 3 3 3 3 1 3
[36] 2 3 3 3 3 2 2 2
Within cluster sum of squares by cluster:
[1] 43204.79 37849.23 88308.60
(between_SS / total_SS = 63.5 %)
Available components:
[1] "cluster" "centers" "totss" "withinss"
[5] "tot.withinss" "betweenss" "size" "iter"
[9] "ifault"
table(clusters$cluster, df$Manufacturer)
G K Q
1 2 5 0
2 0 5 3
3 15 10 3
Phoenix!
# Writing the data into R
df <- data.frame(x = seq(from = 24, to = 33, by = 1), y = c(6.5, 7.1, 7, 7.1,
7.2, 7.1, 8.1, 8.7, 9.5, 10.1))
attach(df)
# Fitting/summarising the Regression model
fit <- lm(y ~ x, data = df)
summary(fit)
Call:
lm(formula = y ~ x, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.9236 -0.2655 0.0100 0.3591 0.6073
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.62727 1.61408 -1.628 0.142235
x 0.36727 0.05635 6.518 0.000185 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5118 on 8 degrees of freedom
Multiple R-squared: 0.8415, Adjusted R-squared: 0.8217
F-statistic: 42.48 on 1 and 8 DF, p-value: 0.0001846
# Fitting the model in R gives you answers but does not show how those
# answers were calculated so I'll do that here
# Manually defining values in R based on data/model fit
k <- length(fit$coefficients) - 1
n <- length(fit$residuals)
SSE <- sum(fit$residuals^2)
SSyy <- sum((y - mean(y))^2)
res.std.err <- sqrt(SSE/(n - (1 + k)))
r2 <- (SSyy - SSE)/SSyy
# Equivalent R2 calculation: 1 - SSE/SSyy
Fstat <- ((SSyy - SSE)/k)/(SSE/(n - (k + 1)))
# Printing manually defined values
out <- data.frame(k, n, SSE, SSyy, res.std.err, r2, Fstat)
kable(out)
| 1 |
10 |
2.095636 |
13.224 |
0.511815 |
0.8415278 |
42.48204 |