The extraction of biologically relevant information from a whole set of gene expression date involves grouping together genes that share similar expression patterns. One of the simplest ways of assessing similarity in expression pattern is by calculating the ‘Pearson correlation coefficient’ between the expression profiles of each possible pair chosen from genes represented in the dataset.
Correlation is usually computed on two quantitative variables, but it can also be computed on two qualitative ordinal variables.
The Pearson correlation coefficient measures the linear dependence between two random variables. It is defined as the covariance of two variables divided by the product of their standard deviations.
\[\rho_{x, y} = \frac {E[(X - \mu_{x})(Y - \mu_{y})]} {\sigma_{x}.\sigma_{y}}\]
The resulting \(\rho_{x, y}\) value lies between - 1 and 1. If
The Pearson correlation coefficient reflects the extent of a linear correlation between two variable. That means that two variables are perfectly correlated iff every pair of objects with different values for one of the variables also have a ‘proportional’ difference between their values for the other variable.
set.seed(666)
x <- rnorm(50)
y <- rnorm(50)
randomData <- data.frame(x, y)
We’ve set the random seed, then created two vectors x and y, each independently sampled from a normal distribution with mean 0 and sd 1.
To compute the correlation between two vectors (of equal length), we will use a function called ‘cor()’. This function by default computes the Pearson’s correlation coefficient.
cor(x, y)
## [1] 0.008217302
If we use a scatter plot to visualize the relation between two set of values.
library("ggplot2")
ggplot(randomData, aes(x, y)) +
geom_point(color = "red3") +
geom_smooth(method = "gam")
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
Here we can see a linear fit but the points are scattered to figure it out. So we are going to use a null hypothesis to compute the correlation between these two variables.
Let’s say, two vectors x and y were independently drawn from two normal distributions. The absolute value of the correlation we observed between x and y is 0.008217302.
cor.test(x, y)
##
## Pearson's product-moment correlation
##
## data: x and y
## t = 0.056933, df = 48, p-value = 0.9548
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2707497 0.2859111
## sample estimates:
## cor
## 0.008217302
As we can see, it has computed 95% confidence intervals on the correlation coefficient, a t-statistic value and degree of freedom and a p-value corresponding to this distribution. So a p-value of 0.9548 implies that we would expect to observe correlations of this magnitude more than half of the time we randomly sampled 50 values from two independent normal distributions.
library("dplyr") # Required for piping (%>%) and mutate function
randomData <- randomData %>%
mutate(z = x + rnorm(50, sd = 0.1))
randomData
We have added another column of variable z that is vector x added with random sample of 50 values from a normal distribution, but this time the standard deviation is only 0.1, so these values should be a lot smaller than the values of x they are being added to.
Now let’s compute the correlation between x and z.
cor(x, randomData$z)
## [1] 0.9954193
cor.test(x, randomData$z)
##
## Pearson's product-moment correlation
##
## data: x and randomData$z
## t = 72.135, df = 48, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9919001 0.9974115
## sample estimates:
## cor
## 0.9954193
Here the p-value is \(2.2 * 10^{-16}\), that is small. So these variables are very highly positively correlated.
Again let’s make a scatter plot.
ggplot(randomData, aes(x, z)) +
geom_point(color = "red3") +
geom_smooth(method = "gam")
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
One of the most widely used methods of illuminating order from a large set of data is that of clustering. It’s goal is to classify entities into (unspecified) groups based on their profiles.
Clustering is generally a form of ‘unsupervised learning’, in which a distance metric (a measure of dis-similarity between a pair of entities) is used to allow the most similar entities to be grouped together.
Let’s say we had two vectors x = \({x_{1}, x_{2}, ... ,x_{n}}\) and y = \({y_{1}, y_{2}, ..., y_{n}}\), then the Euclidean distance would be expressed as \(\sum_{i = 1}^{n} \sqrt(x_{i} - y_{i})^2\).
Another distance metric can be derived from the correlation between variables. We can compute a distance between two vectors just by subtracting their correlation coefficient from 1.
Using the Pearson correlation coefficient to identify genes with similar expression patterns across a dataset is a simple but effective form of clustering. There are another approaches for performing clustering on a whole dataset at once.
Let’s load an pre-built example dataset of dendrogram, that is called ‘swiss’.
data("swiss")
Now the object ‘swiss’ has been added to the workspace, which contains a data.frame with features in the column and the objects (the towns) in the rows.
dist(swiss) # Distance matrix of swiss
## Courtelary Delemont Franches-Mnt Moutier Neuveville
## Delemont 80.591776
## Franches-Mnt 88.214588 14.553130
## Moutier 31.876890 52.278149 60.529967
## Neuveville 27.417877 80.928749 91.039868 32.184346
## Porrentruy 83.246814 14.499410 18.902087 58.055663 86.756833
## Broye 98.701125 28.295231 33.805362 68.242409 92.325707
## Glane 101.772344 28.737648 30.262974 71.187795 96.805114
## Gruyere 95.136818 16.546265 18.943149 66.162678 93.667231
## Sarine 86.247240 12.793029 18.141951 58.922170 86.557602
## Veveyse 101.019664 25.695581 27.781902 70.799404 96.866674
## Aigle 48.526010 82.084361 93.952937 43.349077 23.647190
## Aubonne 53.121428 87.519226 99.091306 48.123799 27.560880
## Avenches 45.653159 84.196307 95.733332 42.565780 19.436638
## Cossonay 56.910189 89.750378 101.591271 52.164284 32.105538
## Echallens 59.546180 69.616805 81.459745 41.986366 38.126783
## Grandson 20.747183 83.837829 93.467534 34.052473 13.042607
## Lausanne 31.490197 86.405456 96.666406 48.006100 36.451509
## La Vallee 34.423918 97.144460 106.655579 55.371603 40.469125
## Lavaux 58.690241 89.444117 101.354840 52.635491 32.484341
## Morges 46.270324 84.488118 96.051022 43.402783 21.270047
## Moudon 42.358985 83.540124 94.778185 40.725576 20.804317
## Nyone 42.570558 76.741319 88.820761 39.306957 24.812505
## Orbe 45.144630 86.584696 98.214154 45.614744 24.730985
## Oron 56.440443 87.700420 98.685815 49.005580 31.865775
## Payerne 42.037399 81.569125 92.468151 37.853687 16.991024
## Paysd'enhaut 49.637587 85.354604 96.168787 44.167795 26.505282
## Rolle 48.490799 82.707644 94.717910 44.160871 24.885610
## Vevey 28.259335 76.281875 86.882355 37.391123 29.695623
## Yverdon 36.142490 81.330976 92.552850 36.954417 15.024766
## Conthey 114.537210 45.298200 50.022656 83.830624 105.520958
## Entremont 113.510609 44.838216 51.243325 83.577438 104.342659
## Herens 116.688095 48.143905 52.794034 85.830664 106.993063
## Martigwy 108.689328 38.814745 45.252664 78.895666 100.751824
## Monthey 101.162629 25.015083 28.946198 71.009383 96.791186
## St Maurice 108.182346 38.817759 46.222133 79.164601 100.580416
## Sierre 113.946566 44.006981 45.563512 82.357429 105.708467
## Sion 98.459265 23.748897 29.427451 69.052253 93.912507
## Boudry 26.409195 83.033718 93.195699 35.414580 12.592125
## La Chauxdfnd 22.611919 85.355214 93.231336 44.019205 40.516255
## Le Locle 10.900349 81.348106 89.606263 34.786671 28.363596
## Neuchatel 33.152400 84.200394 93.616507 47.543480 39.800975
## Val de Ruz 22.042234 80.966394 90.279150 30.116109 10.183128
## ValdeTravers 17.194653 84.346998 92.994960 38.062244 29.033086
## V. De Geneve 74.322166 94.666309 103.578297 81.525854 82.328685
## Rive Droite 63.876685 56.363624 69.856073 50.726577 57.433291
## Rive Gauche 64.802137 57.393729 68.628018 55.808903 66.799618
## Porrentruy Broye Glane Gruyere Sarine Veveyse
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye 36.612817
## Glane 37.339766 10.246272
## Gruyere 21.318067 18.263691 18.185711
## Sarine 15.321426 25.780048 25.761762 12.999388
## Veveyse 32.690849 9.075660 6.726255 12.871037 22.257873
## Aigle 88.657219 87.563057 93.952858 92.111631 87.119858 93.810757
## Aubonne 94.866327 92.314064 98.433186 97.727478 93.605406 98.633643
## Avenches 90.869740 90.361310 96.163158 94.908364 89.495489 96.331679
## Cossonay 96.430454 93.051765 99.780838 98.945957 94.897911 99.712407
## Echallens 77.409734 70.665356 77.412542 77.654239 74.915368 77.451844
## Grandson 88.009448 97.421828 102.047242 97.055020 89.745899 101.413294
## Lausanne 87.154470 102.184919 107.186951 99.097193 89.620605 105.587452
## La Vallee 97.979520 112.603641 117.801019 109.541455 100.692169 116.118696
## Lavaux 96.883450 92.115634 98.582972 98.661588 94.727882 98.905980
## Morges 90.815668 90.535487 96.717862 94.820955 89.575122 96.595675
## Moudon 89.322408 91.679217 97.598512 94.894059 89.839187 97.175656
## Nyone 81.273027 85.204367 91.879108 87.327492 81.443278 90.792571
## Orbe 91.720210 94.344488 100.927754 97.261354 92.074005 100.120618
## Oron 95.666498 91.480339 97.238046 97.625780 94.197454 97.753077
## Payerne 88.544935 88.999294 94.221361 92.996095 87.711359 94.507589
## Paysd'enhaut 93.041765 92.107405 97.572640 96.535548 92.648003 97.825674
## Rolle 89.004621 89.110364 95.645615 93.134057 88.423558 95.233829
## Vevey 77.597887 91.104183 96.413173 88.714678 79.939329 94.750792
## Yverdon 86.632101 91.081406 96.760393 93.277837 87.332860 96.244429
## Conthey 53.268279 25.073683 29.542554 35.430800 46.444902 28.619748
## Entremont 51.387081 23.862709 30.189077 34.664825 44.509550 28.376661
## Herens 56.210452 25.450196 29.601446 37.916868 48.632545 29.469511
## Martigwy 44.766082 17.687908 25.085055 27.691950 37.285606 22.212215
## Monthey 31.725896 13.595841 16.571771 13.619930 25.411918 11.665852
## St Maurice 43.875849 22.608275 29.939773 28.960181 37.821190 26.409515
## Sierre 54.100112 23.662251 22.528426 34.612485 45.152479 25.059579
## Sion 30.765526 12.718113 16.311005 12.319318 20.291932 12.527107
## Boudry 87.248453 94.529164 99.544420 95.187722 87.607520 99.019595
## La Chauxdfnd 84.960570 103.339023 107.372328 98.485300 88.941037 105.568141
## Le Locle 83.175913 98.706671 102.518211 95.119202 86.070701 101.358631
## Neuchatel 84.784801 99.472835 103.745398 96.317457 85.861875 102.505639
## Val de Ruz 86.107259 94.010927 98.265386 94.195647 87.228998 98.002855
## ValdeTravers 85.820198 100.509403 104.933456 97.529536 89.003387 103.526285
## V. De Geneve 90.289052 110.824456 115.486633 103.791035 93.366437 112.551601
## Rive Droite 57.692544 66.320332 73.473212 64.850887 58.587904 70.796698
## Rive Gauche 53.966264 72.231367 78.181193 66.015495 57.659106 74.560367
## Aigle Aubonne Avenches Cossonay Echallens Grandson
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye
## Glane
## Gruyere
## Sarine
## Veveyse
## Aigle
## Aubonne 12.566324
## Avenches 9.159591 10.856592
## Cossonay 12.098760 9.936926 14.226458
## Echallens 22.515603 23.543681 25.220486 23.298163
## Grandson 30.220165 34.017068 27.373288 37.175400 44.460882
## Lausanne 46.828390 55.875626 47.435877 56.210000 62.258880 32.059571
## La Vallee 50.232926 58.488925 51.363590 57.882890 67.661825 32.896238
## Lavaux 13.398224 7.976522 13.584848 7.206969 22.761582 39.620343
## Morges 4.987394 12.009230 6.935416 11.661394 24.935535 27.217731
## Moudon 15.140343 13.759451 12.355893 17.236879 26.886472 23.050996
## Nyone 14.977463 25.338644 20.175086 23.861106 28.742192 26.564932
## Orbe 12.821950 18.027060 16.359184 16.370535 30.263344 25.385035
## Oron 20.369202 9.416841 17.326884 15.547231 23.062524 38.213741
## Payerne 15.695034 13.172001 8.814760 19.659809 25.762005 24.783561
## Paysd'enhaut 20.209691 11.104238 17.118321 20.007189 26.833367 31.951332
## Rolle 6.639277 11.678720 11.634178 12.824196 23.741954 29.696909
## Vevey 38.155519 47.547935 39.327483 47.668224 50.682123 25.436894
## Yverdon 15.868094 18.827610 13.134645 22.111725 30.270778 17.263545
## Conthey 97.173793 100.344774 100.849583 101.176243 78.693711 110.720134
## Entremont 95.417428 99.235669 99.230905 99.549935 77.385273 109.497554
## Herens 98.332906 101.285699 101.786762 101.988393 79.350236 112.550566
## Martigwy 92.780405 97.372974 96.646888 97.473276 75.369076 105.523815
## Monthey 92.624403 97.137132 95.685444 98.311037 76.055640 100.741979
## St Maurice 92.445668 97.330181 96.613130 97.480242 75.943134 105.072916
## Sierre 99.877393 102.582826 102.580509 104.280341 81.420437 111.633533
## Sion 89.991478 95.669816 93.326524 96.599379 75.113094 98.473554
## Boudry 25.387005 32.260231 23.573631 33.255075 41.037500 11.114063
## La Chauxdfnd 55.315666 62.874084 54.887973 63.396695 67.302512 31.479836
## Le Locle 46.272022 52.862298 44.955357 54.975995 58.778741 20.352553
## Neuchatel 51.771228 61.579968 51.864729 61.607467 65.463336 37.639798
## Val de Ruz 29.384732 31.899843 25.649203 36.290667 41.433355 7.453784
## ValdeTravers 44.013145 50.436043 43.066674 51.408160 56.783734 18.568051
## V. De Geneve 87.329791 98.311316 90.297941 97.169030 97.509741 79.848930
## Rive Droite 51.834912 61.137923 56.733588 60.689802 51.604001 59.617421
## Rive Gauche 66.400724 76.521981 70.604391 75.799011 67.673680 66.103638
## Lausanne La Vallee Lavaux Morges Moudon Nyone
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye
## Glane
## Gruyere
## Sarine
## Veveyse
## Aigle
## Aubonne
## Avenches
## Cossonay
## Echallens
## Grandson
## Lausanne
## La Vallee 17.209346
## Lavaux 58.798749 61.705722
## Morges 46.047523 48.548186 13.932412
## Moudon 46.823371 49.241008 19.748478 12.458094
## Nyone 35.867686 40.352572 27.218376 16.196237 19.454419
## Orbe 42.585891 43.177222 21.223562 11.245039 12.420644 13.134824
## Oron 63.130057 65.523603 13.122256 19.415687 18.163546 32.660337
## Payerne 49.439199 53.365779 18.734249 13.424604 11.004731 24.294405
## Paysd'enhaut 57.719516 60.101482 18.626283 19.185122 14.379207 29.880870
## Rolle 46.855972 50.040732 14.748030 8.431495 12.376995 14.437327
## Vevey 13.576911 23.570874 50.611011 37.606953 37.674310 25.891358
## Yverdon 39.528599 43.037106 24.211931 13.420391 7.747671 16.117121
## Conthey 117.128391 126.855720 99.936865 100.702981 101.465837 95.905448
## Entremont 113.968745 124.290751 98.439197 99.042882 100.133489 93.460749
## Herens 119.389749 129.267639 100.576765 101.821868 102.779815 97.604865
## Martigwy 109.087316 119.308743 96.714086 96.264391 97.492428 90.042115
## Monthey 105.037480 115.251485 97.711230 95.630540 95.587290 89.086342
## St Maurice 107.213817 117.662773 96.806758 96.091461 97.224542 88.979416
## Sierre 119.927155 129.712205 102.518703 103.053932 103.835946 99.775109
## Sion 100.212466 110.950450 95.861254 93.176714 94.388114 86.147235
## Boudry 29.584119 31.444251 35.931440 22.525810 23.209265 21.593527
## La Chauxdfnd 23.190567 22.378329 66.997034 53.328919 51.242491 44.892232
## Le Locle 23.232996 24.873377 57.653139 44.216966 41.892601 38.079606
## Neuchatel 14.384231 25.194501 63.733009 50.957886 53.317258 42.208866
## Val de Ruz 37.734197 39.525212 37.867491 26.482402 22.090100 28.393818
## ValdeTravers 24.424611 22.629185 55.034045 41.539937 38.654714 35.239184
## V. De Geneve 49.238429 57.970390 99.425852 88.404537 91.016056 74.727371
## Rive Droite 49.338245 61.399010 61.564341 55.199275 57.355977 41.626123
## Rive Gauche 48.881473 60.654286 77.704634 68.760454 69.919426 53.757010
## Orbe Oron Payerne Paysd'enhaut Rolle
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye
## Glane
## Gruyere
## Sarine
## Veveyse
## Aigle
## Aubonne
## Avenches
## Cossonay
## Echallens
## Grandson
## Lausanne
## La Vallee
## Lavaux
## Morges
## Moudon
## Nyone
## Orbe
## Oron 25.399803
## Payerne 20.286717 15.599644
## Paysd'enhaut 22.726628 10.419482 12.797222
## Rolle 9.994518 19.988807 16.292026 17.834113
## Vevey 34.275466 54.073409 40.717845 49.120261 38.201408
## Yverdon 13.023440 24.397541 12.443749 19.451005 14.062873
## Conthey 103.759578 99.051179 99.667901 99.852604 97.732339
## Entremont 101.960435 98.553378 98.651166 99.552923 96.042395
## Herens 105.313864 99.741366 100.681542 101.050352 99.106248
## Martigwy 99.059011 96.976253 96.074830 97.766354 93.643513
## Monthey 98.195776 96.428276 93.847643 95.986434 93.324595
## St Maurice 98.324308 97.492234 96.268109 97.775508 92.970456
## Sierre 107.306839 100.475786 100.537520 101.265740 101.022560
## Sion 96.160111 95.824240 92.195770 95.337783 91.217115
## Boudry 22.682734 37.524770 23.977742 33.589040 26.910593
## La Chauxdfnd 49.465727 67.804661 54.134773 62.279153 55.417821
## Le Locle 41.810171 57.415089 43.186805 51.210698 46.450619
## Neuchatel 50.033373 68.142501 53.764171 63.857494 53.441370
## Val de Ruz 26.999498 34.750178 21.173512 28.442540 29.358176
## ValdeTravers 37.691411 55.014294 41.810362 49.424064 43.862568
## V. De Geneve 85.206335 106.016289 93.482844 101.458062 87.714562
## Rive Droite 53.927385 67.027762 59.345598 64.019348 51.381262
## Rive Gauche 66.250863 82.397785 72.491241 78.601927 66.153398
## Vevey Yverdon Conthey Entremont Herens Martigwy
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye
## Glane
## Gruyere
## Sarine
## Veveyse
## Aigle
## Aubonne
## Avenches
## Cossonay
## Echallens
## Grandson
## Lausanne
## La Vallee
## Lavaux
## Morges
## Moudon
## Nyone
## Orbe
## Oron
## Payerne
## Paysd'enhaut
## Rolle
## Vevey
## Yverdon 30.692501
## Conthey 105.815181 102.100745
## Entremont 102.895619 100.503116 9.671137
## Herens 107.990470 103.578473 5.657217 10.459082
## Martigwy 98.037238 97.443777 14.154240 8.485776 15.677423
## Monthey 94.015784 94.939688 22.392858 22.656160 25.124259 17.067150
## St Maurice 96.528079 96.942104 17.405818 10.810846 20.195633 7.490661
## Sierre 108.683991 104.277417 16.825056 23.702287 16.040935 24.732772
## Sion 89.935460 93.050862 27.795223 25.908541 30.110611 19.017805
## Boudry 22.275448 17.037617 108.587421 106.808209 110.144834 102.611966
## La Chauxdfnd 22.837226 44.893943 119.915330 117.725877 122.280678 112.082242
## Le Locle 20.267896 35.246339 114.557410 112.972482 116.800721 107.864024
## Neuchatel 19.940702 45.955766 116.741998 113.458176 118.697921 107.972643
## Val de Ruz 30.264172 17.290948 107.265920 106.495090 108.963714 102.680330
## ValdeTravers 19.856890 32.710587 115.752035 114.003864 117.925199 108.970483
## V. De Geneve 55.469941 84.259347 125.980423 120.757176 128.729583 115.026234
## Rive Droite 42.277310 53.502700 76.349580 71.485471 78.914922 67.570710
## Rive Gauche 44.070817 64.916969 85.302253 80.238161 88.258931 74.815419
## Monthey St Maurice Sierre Sion Boudry
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye
## Glane
## Gruyere
## Sarine
## Veveyse
## Aigle
## Aubonne
## Avenches
## Cossonay
## Echallens
## Grandson
## Lausanne
## La Vallee
## Lavaux
## Morges
## Moudon
## Nyone
## Orbe
## Oron
## Payerne
## Paysd'enhaut
## Rolle
## Vevey
## Yverdon
## Conthey
## Entremont
## Herens
## Martigwy
## Monthey
## St Maurice 19.360413
## Sierre 24.180108 29.831862
## Sion 12.066155 20.134371 28.962336
## Boudry 98.995050 102.326407 109.734341 95.830079
## La Chauxdfnd 105.527460 111.058511 121.296533 102.056463 32.270248
## Le Locle 101.310513 107.079436 115.085697 98.016489 22.945588
## Neuchatel 103.464100 106.668972 118.256423 97.708127 32.933569
## Val de Ruz 97.591662 102.583713 107.350874 95.577244 14.106470
## ValdeTravers 103.135614 108.237970 117.066930 100.329071 20.778616
## V. De Geneve 112.055631 111.432887 130.954131 104.590966 76.109450
## Rive Droite 67.704240 63.911008 83.383457 62.348857 55.938145
## Rive Gauche 72.224803 71.037053 91.736835 66.378460 62.933251
## La Chauxdfnd Le Locle Neuchatel Val de Ruz ValdeTravers
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye
## Glane
## Gruyere
## Sarine
## Veveyse
## Aigle
## Aubonne
## Avenches
## Cossonay
## Echallens
## Grandson
## Lausanne
## La Vallee
## Lavaux
## Morges
## Moudon
## Nyone
## Orbe
## Oron
## Payerne
## Paysd'enhaut
## Rolle
## Vevey
## Yverdon
## Conthey
## Entremont
## Herens
## Martigwy
## Monthey
## St Maurice
## Sierre
## Sion
## Boudry
## La Chauxdfnd
## Le Locle 13.862355
## Neuchatel 24.346394 25.475478
## Val de Ruz 36.409647 24.209347 41.845460
## ValdeTravers 13.565751 9.053999 28.584837 23.895657
## V. De Geneve 60.283435 66.751812 47.428856 84.499922 69.038222
## Rive Droite 61.603487 59.227731 52.393226 60.977550 60.125522
## Rive Gauche 57.294516 59.080641 49.749855 68.482331 60.474643
## V. De Geneve Rive Droite
## Delemont
## Franches-Mnt
## Moutier
## Neuveville
## Porrentruy
## Broye
## Glane
## Gruyere
## Sarine
## Veveyse
## Aigle
## Aubonne
## Avenches
## Cossonay
## Echallens
## Grandson
## Lausanne
## La Vallee
## Lavaux
## Morges
## Moudon
## Nyone
## Orbe
## Oron
## Payerne
## Paysd'enhaut
## Rolle
## Vevey
## Yverdon
## Conthey
## Entremont
## Herens
## Martigwy
## Monthey
## St Maurice
## Sierre
## Sion
## Boudry
## La Chauxdfnd
## Le Locle
## Neuchatel
## Val de Ruz
## ValdeTravers
## V. De Geneve
## Rive Droite 56.901126
## Rive Gauche 42.678567 21.457866
As you can see the output of this is quite large (47 x 47) matrix, as the data frame has 47 rows. We can apply a hierarchical clustering algorithm to the distance matrix using the R function ‘hclust’.
hclust(dist(swiss))
##
## Call:
## hclust(d = dist(swiss))
##
## Cluster method : complete
## Distance : euclidean
## Number of objects: 47
If we plot this dendrogram using plot() function, then it will look like this.
plot(hclust(dist(swiss)))
Here the inbuilt dataset swiss has the features as columns and most molecular biology dataset have the features as rows. Let’s transpose the distance matrix and apply a hierarchical clustering algorithm.
hclust(dist(t(swiss)))
##
## Call:
## hclust(d = dist(t(swiss)))
##
## Cluster method : complete
## Distance : euclidean
## Number of objects: 6
Lastly, plot the transpose matrix using plot function.
plot(hclust(dist(t(swiss))))