Unsupervised techniques are often used in the analysis of genomic data. In particular, PCA and hierarchical clustering are popular tools. We illustrate these techniques on the NCI60 cancer cell line microarray data, which consists of 6,830 gene expression measurements on 64 cancer cell lines.

library(ISLR)
nci.labs=NCI60$labs
nci.data=NCI60$data

Each cell line is labeled with a cancer type. We do not make use of the cancer types in performing PCA and clustering, as these are unsupervised techniques. But after performing PCA and clustering, we will check to see the extent to which these cancer types agree with the results of these unsupervised techniques.

The data has 64 rows and 6,830 columns.

dim(nci.data)
[1]   64 6830

We begin by examining the cancer types for the cell lines.

nci.labs[1:4]
[1] "CNS"   "CNS"   "CNS"   "RENAL"
table(nci.labs)
nci.labs
     BREAST         CNS       COLON K562A-repro K562B-repro    LEUKEMIA MCF7A-repro MCF7D-repro 
          7           5           7           1           1           6           1           1 
   MELANOMA       NSCLC     OVARIAN    PROSTATE       RENAL     UNKNOWN 
          8           9           6           2           9           1 

PCA on the NCI60 Data

We first perform PCA on the data after scaling the variables (genes) to have standard deviation one, although one could reasonably argue that it is better not to scale the genes.

pr.out=prcomp(nci.data , scale=TRUE)

We now plot the first few principal component score vectors, in order to visualize the data. The observations (cell lines) corresponding to a given cancer type will be plotted in the same color, so that we can see to what extent the observations within a cancer type are similar to each other. We first create a simple function that assigns a distinct color to each element of a numeric vector. The function will be used to assign a color to each of the 64 cell lines, based on the cancer type to which it corresponds.

Cols=function (vec){
 cols=rainbow (length(unique(vec)))
 return(cols[as.numeric (as.factor(vec))])
}

Note that the rainbow() function takes as its argument a positive integer, and returns a vector containing that number of distinct colors. We now can plot the principal component score vectors.

par(mfrow=c(1,2))
plot(pr.out$x[,1:2], col=Cols(nci.labs), pch=19, xlab="Z1",ylab="Z2")
plot(pr.out$x[,c(1,3)], col=Cols(nci.labs), pch=19, xlab="Z1",ylab="Z3")

The resulting plots are shown in Figure 10.15. On the whole, cell lines corresponding to a single cancer type do tend to have similar values on the first few principal component score vectors. This indicates that cell lines from the same cancer type tend to have pretty similar gene expression levels.

We can obtain a summary of the proportion of variance explained (PVE) of the first few principal components using the summary() method for a prcomp object (we have truncated the printout):

summary(pr.out)
Importance of components:
                           PC1      PC2      PC3      PC4      PC5      PC6      PC7      PC8
Standard deviation     27.8535 21.48136 19.82046 17.03256 15.97181 15.72108 14.47145 13.54427
Proportion of Variance  0.1136  0.06756  0.05752  0.04248  0.03735  0.03619  0.03066  0.02686
Cumulative Proportion   0.1136  0.18115  0.23867  0.28115  0.31850  0.35468  0.38534  0.41220
                            PC9     PC10     PC11     PC12     PC13     PC14     PC15     PC16
Standard deviation     13.14400 12.73860 12.68672 12.15769 11.83019 11.62554 11.43779 11.00051
Proportion of Variance  0.02529  0.02376  0.02357  0.02164  0.02049  0.01979  0.01915  0.01772
Cumulative Proportion   0.43750  0.46126  0.48482  0.50646  0.52695  0.54674  0.56590  0.58361
                           PC17     PC18     PC19    PC20     PC21    PC22    PC23    PC24    PC25
Standard deviation     10.65666 10.48880 10.43518 10.3219 10.14608 10.0544 9.90265 9.64766 9.50764
Proportion of Variance  0.01663  0.01611  0.01594  0.0156  0.01507  0.0148 0.01436 0.01363 0.01324
Cumulative Proportion   0.60024  0.61635  0.63229  0.6479  0.66296  0.6778 0.69212 0.70575 0.71899
                          PC26    PC27   PC28    PC29    PC30    PC31    PC32    PC33    PC34    PC35
Standard deviation     9.33253 9.27320 9.0900 8.98117 8.75003 8.59962 8.44738 8.37305 8.21579 8.15731
Proportion of Variance 0.01275 0.01259 0.0121 0.01181 0.01121 0.01083 0.01045 0.01026 0.00988 0.00974
Cumulative Proportion  0.73174 0.74433 0.7564 0.76824 0.77945 0.79027 0.80072 0.81099 0.82087 0.83061
                          PC36    PC37    PC38    PC39    PC40    PC41   PC42    PC43   PC44    PC45
Standard deviation     7.97465 7.90446 7.82127 7.72156 7.58603 7.45619 7.3444 7.10449 7.0131 6.95839
Proportion of Variance 0.00931 0.00915 0.00896 0.00873 0.00843 0.00814 0.0079 0.00739 0.0072 0.00709
Cumulative Proportion  0.83992 0.84907 0.85803 0.86676 0.87518 0.88332 0.8912 0.89861 0.9058 0.91290
                         PC46    PC47    PC48    PC49    PC50    PC51    PC52    PC53    PC54    PC55
Standard deviation     6.8663 6.80744 6.64763 6.61607 6.40793 6.21984 6.20326 6.06706 5.91805 5.91233
Proportion of Variance 0.0069 0.00678 0.00647 0.00641 0.00601 0.00566 0.00563 0.00539 0.00513 0.00512
Cumulative Proportion  0.9198 0.92659 0.93306 0.93947 0.94548 0.95114 0.95678 0.96216 0.96729 0.97241
                          PC56    PC57   PC58    PC59    PC60    PC61    PC62    PC63      PC64
Standard deviation     5.73539 5.47261 5.2921 5.02117 4.68398 4.17567 4.08212 4.04124 2.148e-14
Proportion of Variance 0.00482 0.00438 0.0041 0.00369 0.00321 0.00255 0.00244 0.00239 0.000e+00
Cumulative Proportion  0.97723 0.98161 0.9857 0.98940 0.99262 0.99517 0.99761 1.00000 1.000e+00

Using the plot() function, we can also plot the variance explained by the first few principal components.

plot(pr.out)

Note that the height of each bar in the bar plot is given by squaring the corresponding element of pr.out$sdev. However, it is more informative to plot the PVE of each principal component (i.e. a scree plot) and the cumulative PVE of each principal component. This can be done with just a little work.

pve =100*pr.out$sdev ^2/sum(pr.out$sdev ^2)
par(mfrow=c(1,2))
plot(pve , type="o", ylab="PVE", xlab=" Principal Component ", col="blue")
plot(cumsum(pve), type="o", ylab="Cumulative PVE", xlab="Principal Component ", col="brown3")

(Note that the elements of pve can also be computed directly from the summary, summary(pr.out)$importance[2,], and the elements of cumsum(pve) are given by summary(pr.out)$importance[3,].) The resulting plots are shown in Figure 10.16. We see that together, the first seven principal components explain around 40 % of the variance in the data. This is not a huge amount of the variance. However, looking at the scree plot, we see that while each of the first seven principal components explain a substantial amount of variance, there is a marked decrease in the variance explained by further principal components. That is, there is an elbow in the plot after approximately the seventh principal component. This suggests that there may be little benefit to examining more than seven or so principal components (though even examining seven principal components may be difficult).

Clustering the Observations of the NCI60 Data

We now proceed to hierarchically cluster the cell lines in the NCI60 data, with the goal of finding out whether or not the observations cluster into distinct types of cancer. To begin, we standardize the variables to have mean zero and standard deviation one. As mentioned earlier, this step is optional and should be performed only if we want each gene to be on the same scale.

sd.data=scale(nci.data)

We now perform hierarchical clustering of the observations using complete, single, and average linkage. Euclidean distance is used as the dissimilarity measure.

par(mfrow=c(1,3))
data.dist=dist(sd.data)
plot(hclust(data.dist), labels=nci.labs , main="Complete Linkage ", xlab="", sub="",ylab="")
plot(hclust(data.dist , method ="average"), labels=nci.labs , main="Average Linkage ", xlab="", sub="",ylab="")
plot(hclust(data.dist , method ="single"), labels=nci.labs , main="Single Linkage ", xlab="", sub="",ylab="")

The results are shown in Figure 10.17. We see that the choice of linkage certainly does affect the results obtained. Typically, single linkage will tend to yield trailing clusters: very large clusters onto which individual observations attach one-by-one. On the other hand, complete and average linkage tend to yield more balanced, attractive clusters. For this reason, complete and average linkage are generally preferred to single linkage. Clearly cell lines within a single cancer type do tend to cluster together, although the clustering is not perfect. We will use complete linkage hierarchical clustering for the analysis that follows.

We can cut the dendrogram at the height that will yield a particular number of clusters, say four:

hc.out=hclust(dist(sd.data))
hc.clusters =cutree (hc.out ,4)
table(hc.clusters ,nci.labs)
           nci.labs
hc.clusters BREAST CNS COLON K562A-repro K562B-repro LEUKEMIA MCF7A-repro MCF7D-repro MELANOMA NSCLC
          1      2   3     2           0           0        0           0           0        8     8
          2      3   2     0           0           0        0           0           0        0     1
          3      0   0     0           1           1        6           0           0        0     0
          4      2   0     5           0           0        0           1           1        0     0
           nci.labs
hc.clusters OVARIAN PROSTATE RENAL UNKNOWN
          1       6        2     8       1
          2       0        0     1       0
          3       0        0     0       0
          4       0        0     0       0

There are some clear patterns. All the leukemia cell lines fall in cluster 3, while the breast cancer cell lines are spread out over three different clusters. We can plot the cut on the dendrogram that produces these four clusters:

par(mfrow=c(1,1))
plot(hc.out , labels =nci.labs)
abline(h=139, col="red")

The abline() function draws a straight line on top of any existing plot in R. The argument h=139 plots a horizontal line at height 139 on the dendrogram; this is the height that results in four distinct clusters. It is easy to verify that the resulting clusters are the same as the ones we obtained using cutree(hc.out,4).

Printing the output of hclust gives a useful brief summary of the object:

hc.out

Call:
hclust(d = dist(sd.data))

Cluster method   : complete 
Distance         : euclidean 
Number of objects: 64 

We claimed earlier in Section 10.3.2 that K-means clustering and hierarchical clustering with the dendrogram cut to obtain the same number of clusters can yield very different results. How do these NCI60 hierarchical clustering results compare to what we get if we perform K-means clustering with K = 4?

set.seed(2)
km.out=kmeans(sd.data , 4, nstart =20)
km.clusters =km.out$cluster
table(km.clusters ,hc.clusters)
           hc.clusters
km.clusters  1  2  3  4
          1 11  0  0  9
          2 20  7  0  0
          3  9  0  0  0
          4  0  0  8  0

We see that the four clusters obtained using hierarchical clustering and Kmeans clustering are somewhat different. Cluster 2 in K-means clustering is identical to cluster 3 in hierarchical clustering. However, the other clusters differ: for instance, cluster 4 in K-means clustering contains a portion of the observations assigned to cluster 1 by hierarchical clustering, as well as all of the observations assigned to cluster 2 by hierarchical clustering.

Rather than performing hierarchical clustering on the entire data matrix, we can simply perform hierarchical clustering on the first few principal component score vectors, as follows:

hc.out=hclust(dist(pr.out$x [,1:5]) )
plot(hc.out , labels =nci.labs , main="Hier. Clust. on First Five Score Vectors ")

table(cutree(hc.out ,4), nci.labs)
   nci.labs
    BREAST CNS COLON K562A-repro K562B-repro LEUKEMIA MCF7A-repro MCF7D-repro MELANOMA NSCLC OVARIAN
  1      0   2     7           0           0        2           0           0        1     8       5
  2      5   3     0           0           0        0           0           0        7     1       1
  3      0   0     0           1           1        4           0           0        0     0       0
  4      2   0     0           0           0        0           1           1        0     0       0
   nci.labs
    PROSTATE RENAL UNKNOWN
  1        2     7       0
  2        0     2       1
  3        0     0       0
  4        0     0       0

Not surprisingly, these results are different from the ones that we obtained when we performed hierarchical clustering on the full data set. Sometimes performing clustering on the first few principal component score vectors can give better results than performing clustering on the full data. In this situation, we might view the principal component step as one of denoising the data. We could also perform K-means clustering on the first few principal component score vectors rather than the full data set.

LS0tDQp0aXRsZTogIk5DSTYwIERhdGEgRXhhbXBsZSBSIExhYiINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2Zsb2F0OiB0cnVlDQotLS0NCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQobWVzc2FnZT1GQUxTRSx3YXJuaW5nPUZBTFNFKQ0KYGBgDQoNClVuc3VwZXJ2aXNlZCB0ZWNobmlxdWVzIGFyZSBvZnRlbiB1c2VkIGluIHRoZSBhbmFseXNpcyBvZiBnZW5vbWljIGRhdGEuIEluIHBhcnRpY3VsYXIsIFBDQSBhbmQgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcgYXJlIHBvcHVsYXIgdG9vbHMuIFdlIGlsbHVzdHJhdGUgdGhlc2UgdGVjaG5pcXVlcyBvbiB0aGUgYE5DSTYwYCBjYW5jZXIgY2VsbCBsaW5lIG1pY3JvYXJyYXkgZGF0YSwgd2hpY2ggY29uc2lzdHMgb2YgNiw4MzAgZ2VuZSBleHByZXNzaW9uIG1lYXN1cmVtZW50cyBvbiA2NCBjYW5jZXIgY2VsbCBsaW5lcy4NCg0KYGBge3J9DQpsaWJyYXJ5KElTTFIpDQpuY2kubGFicz1OQ0k2MCRsYWJzDQpuY2kuZGF0YT1OQ0k2MCRkYXRhDQpgYGANCg0KRWFjaCBjZWxsIGxpbmUgaXMgbGFiZWxlZCB3aXRoIGEgY2FuY2VyIHR5cGUuIFdlIGRvIG5vdCBtYWtlIHVzZSBvZiB0aGUgY2FuY2VyIHR5cGVzIGluIHBlcmZvcm1pbmcgUENBIGFuZCBjbHVzdGVyaW5nLCBhcyB0aGVzZSBhcmUgdW5zdXBlcnZpc2VkIHRlY2huaXF1ZXMuIEJ1dCBhZnRlciBwZXJmb3JtaW5nIFBDQSBhbmQgY2x1c3RlcmluZywgd2Ugd2lsbCBjaGVjayB0byBzZWUgdGhlIGV4dGVudCB0byB3aGljaCB0aGVzZSBjYW5jZXIgdHlwZXMgYWdyZWUgd2l0aCB0aGUgcmVzdWx0cyBvZiB0aGVzZSB1bnN1cGVydmlzZWQgdGVjaG5pcXVlcy4NCg0KVGhlIGRhdGEgaGFzIDY0IHJvd3MgYW5kIDYsODMwIGNvbHVtbnMuDQoNCmBgYHtyfQ0KZGltKG5jaS5kYXRhKQ0KYGBgDQoNCldlIGJlZ2luIGJ5IGV4YW1pbmluZyB0aGUgY2FuY2VyIHR5cGVzIGZvciB0aGUgY2VsbCBsaW5lcy4NCmBgYHtyfQ0KbmNpLmxhYnNbMTo0XQ0KYGBgIA0KYGBge3J9DQp0YWJsZShuY2kubGFicykNCmBgYA0KDQojIyMgUENBIG9uIHRoZSBOQ0k2MCBEYXRhDQoNCldlIGZpcnN0IHBlcmZvcm0gUENBIG9uIHRoZSBkYXRhIGFmdGVyIHNjYWxpbmcgdGhlIHZhcmlhYmxlcyAoZ2VuZXMpIHRvIGhhdmUgc3RhbmRhcmQgZGV2aWF0aW9uIG9uZSwgYWx0aG91Z2ggb25lIGNvdWxkIHJlYXNvbmFibHkgYXJndWUgdGhhdCBpdCBpcyBiZXR0ZXIgbm90IHRvIHNjYWxlIHRoZSBnZW5lcy4NCg0KYGBge3J9DQpwci5vdXQ9cHJjb21wKG5jaS5kYXRhICwgc2NhbGU9VFJVRSkNCmBgYA0KDQpXZSBub3cgcGxvdCB0aGUgZmlyc3QgZmV3IHByaW5jaXBhbCBjb21wb25lbnQgc2NvcmUgdmVjdG9ycywgaW4gb3JkZXIgdG8gdmlzdWFsaXplIHRoZSBkYXRhLiBUaGUgb2JzZXJ2YXRpb25zIChjZWxsIGxpbmVzKSBjb3JyZXNwb25kaW5nIHRvIGEgZ2l2ZW4gY2FuY2VyIHR5cGUgd2lsbCBiZSBwbG90dGVkIGluIHRoZSBzYW1lIGNvbG9yLCBzbyB0aGF0IHdlIGNhbiBzZWUgdG8gd2hhdCBleHRlbnQgdGhlIG9ic2VydmF0aW9ucyB3aXRoaW4gYSBjYW5jZXIgdHlwZSBhcmUgc2ltaWxhciB0byBlYWNoIG90aGVyLiBXZSBmaXJzdCBjcmVhdGUgYSBzaW1wbGUgZnVuY3Rpb24gdGhhdCBhc3NpZ25zIGEgZGlzdGluY3QgY29sb3IgdG8gZWFjaCBlbGVtZW50IG9mIGEgbnVtZXJpYyB2ZWN0b3IuIFRoZSBmdW5jdGlvbiB3aWxsIGJlIHVzZWQgdG8gYXNzaWduIGEgY29sb3IgdG8gZWFjaCBvZiB0aGUgNjQgY2VsbCBsaW5lcywgYmFzZWQgb24gdGhlIGNhbmNlciB0eXBlIHRvIHdoaWNoIGl0IGNvcnJlc3BvbmRzLg0KDQpgYGB7cn0NCkNvbHM9ZnVuY3Rpb24gKHZlYyl7DQogY29scz1yYWluYm93IChsZW5ndGgodW5pcXVlKHZlYykpKQ0KIHJldHVybihjb2xzW2FzLm51bWVyaWMgKGFzLmZhY3Rvcih2ZWMpKV0pDQp9DQpgYGANCg0KTm90ZSB0aGF0IHRoZSBgcmFpbmJvdygpYCBmdW5jdGlvbiB0YWtlcyBhcyBpdHMgYXJndW1lbnQgYSBwb3NpdGl2ZSBpbnRlZ2VyLCBhbmQgcmV0dXJucyBhIHZlY3RvciBjb250YWluaW5nIHRoYXQgbnVtYmVyIG9mIGRpc3RpbmN0IGNvbG9ycy4gV2Ugbm93IGNhbiBwbG90IHRoZSBwcmluY2lwYWwgY29tcG9uZW50IHNjb3JlIHZlY3RvcnMuDQoNCmBgYHtyfQ0KcGFyKG1mcm93PWMoMSwyKSkNCnBsb3QocHIub3V0JHhbLDE6Ml0sIGNvbD1Db2xzKG5jaS5sYWJzKSwgcGNoPTE5LCB4bGFiPSJaMSIseWxhYj0iWjIiKQ0KcGxvdChwci5vdXQkeFssYygxLDMpXSwgY29sPUNvbHMobmNpLmxhYnMpLCBwY2g9MTksIHhsYWI9IloxIix5bGFiPSJaMyIpDQpgYGANCg0KVGhlIHJlc3VsdGluZyBwbG90cyBhcmUgc2hvd24gaW4gRmlndXJlIDEwLjE1LiBPbiB0aGUgd2hvbGUsIGNlbGwgbGluZXMgY29ycmVzcG9uZGluZyB0byBhIHNpbmdsZSBjYW5jZXIgdHlwZSBkbyB0ZW5kIHRvIGhhdmUgc2ltaWxhciB2YWx1ZXMgb24gdGhlIGZpcnN0IGZldyBwcmluY2lwYWwgY29tcG9uZW50IHNjb3JlIHZlY3RvcnMuIFRoaXMgaW5kaWNhdGVzIHRoYXQgY2VsbCBsaW5lcyBmcm9tIHRoZSBzYW1lIGNhbmNlciB0eXBlIHRlbmQgdG8gaGF2ZSBwcmV0dHkgc2ltaWxhciBnZW5lIGV4cHJlc3Npb24gbGV2ZWxzLg0KDQpXZSBjYW4gb2J0YWluIGEgc3VtbWFyeSBvZiB0aGUgcHJvcG9ydGlvbiBvZiB2YXJpYW5jZSBleHBsYWluZWQgKFBWRSkgb2YgdGhlIGZpcnN0IGZldyBwcmluY2lwYWwgY29tcG9uZW50cyB1c2luZyB0aGUgc3VtbWFyeSgpIG1ldGhvZCBmb3IgYSBwcmNvbXAgb2JqZWN0ICh3ZSBoYXZlIHRydW5jYXRlZCB0aGUgcHJpbnRvdXQpOg0KDQpgYGB7cn0NCnN1bW1hcnkocHIub3V0KQ0KYGBgDQoNClVzaW5nIHRoZSBgcGxvdCgpYCBmdW5jdGlvbiwgd2UgY2FuIGFsc28gcGxvdCB0aGUgdmFyaWFuY2UgZXhwbGFpbmVkIGJ5IHRoZSBmaXJzdCBmZXcgcHJpbmNpcGFsIGNvbXBvbmVudHMuDQoNCmBgYHtyfQ0KcGxvdChwci5vdXQpDQpgYGANCg0KTm90ZSB0aGF0IHRoZSBoZWlnaHQgb2YgZWFjaCBiYXIgaW4gdGhlIGJhciBwbG90IGlzIGdpdmVuIGJ5IHNxdWFyaW5nIHRoZQ0KY29ycmVzcG9uZGluZyBlbGVtZW50IG9mIGBwci5vdXQkc2RldmAuIEhvd2V2ZXIsIGl0IGlzIG1vcmUgaW5mb3JtYXRpdmUgdG8gcGxvdCB0aGUgUFZFIG9mIGVhY2ggcHJpbmNpcGFsIGNvbXBvbmVudCAoaS5lLiBhIHNjcmVlIHBsb3QpIGFuZCB0aGUgY3VtdWxhdGl2ZSBQVkUgb2YgZWFjaCBwcmluY2lwYWwgY29tcG9uZW50LiBUaGlzIGNhbiBiZSBkb25lIHdpdGgganVzdCBhIGxpdHRsZSB3b3JrLg0KDQpgYGB7cn0NCnB2ZSA9MTAwKnByLm91dCRzZGV2IF4yL3N1bShwci5vdXQkc2RldiBeMikNCnBhcihtZnJvdz1jKDEsMikpDQpwbG90KHB2ZSAsIHR5cGU9Im8iLCB5bGFiPSJQVkUiLCB4bGFiPSIgUHJpbmNpcGFsIENvbXBvbmVudCAiLCBjb2w9ImJsdWUiKQ0KcGxvdChjdW1zdW0ocHZlKSwgdHlwZT0ibyIsIHlsYWI9IkN1bXVsYXRpdmUgUFZFIiwgeGxhYj0iUHJpbmNpcGFsIENvbXBvbmVudCAiLCBjb2w9ImJyb3duMyIpDQpgYGANCg0KKE5vdGUgdGhhdCB0aGUgZWxlbWVudHMgb2YgcHZlIGNhbiBhbHNvIGJlIGNvbXB1dGVkIGRpcmVjdGx5IGZyb20gdGhlIHN1bW1hcnksIGBzdW1tYXJ5KHByLm91dCkkaW1wb3J0YW5jZVsyLF1gLCBhbmQgdGhlIGVsZW1lbnRzIG9mIGBjdW1zdW0ocHZlKWAgYXJlIGdpdmVuIGJ5IGBzdW1tYXJ5KHByLm91dCkkaW1wb3J0YW5jZVszLF1gLikgVGhlIHJlc3VsdGluZyBwbG90cyBhcmUgc2hvd24gaW4gRmlndXJlIDEwLjE2LiBXZSBzZWUgdGhhdCB0b2dldGhlciwgdGhlIGZpcnN0IHNldmVuIHByaW5jaXBhbCBjb21wb25lbnRzIGV4cGxhaW4gYXJvdW5kIDQwICUgb2YgdGhlIHZhcmlhbmNlIGluIHRoZSBkYXRhLiBUaGlzIGlzIG5vdCBhIGh1Z2UgYW1vdW50IG9mIHRoZSB2YXJpYW5jZS4gSG93ZXZlciwgbG9va2luZyBhdCB0aGUgc2NyZWUgcGxvdCwgd2Ugc2VlIHRoYXQgd2hpbGUgZWFjaCBvZiB0aGUgZmlyc3Qgc2V2ZW4gcHJpbmNpcGFsIGNvbXBvbmVudHMgZXhwbGFpbiBhIHN1YnN0YW50aWFsIGFtb3VudCBvZiB2YXJpYW5jZSwgdGhlcmUgaXMgYSBtYXJrZWQgZGVjcmVhc2UgaW4gdGhlIHZhcmlhbmNlIGV4cGxhaW5lZCBieSBmdXJ0aGVyIHByaW5jaXBhbCBjb21wb25lbnRzLiBUaGF0IGlzLCB0aGVyZSBpcyBhbiAqZWxib3cqIGluIHRoZSBwbG90IGFmdGVyIGFwcHJveGltYXRlbHkgdGhlIHNldmVudGggcHJpbmNpcGFsIGNvbXBvbmVudC4gVGhpcyBzdWdnZXN0cyB0aGF0IHRoZXJlIG1heSBiZSBsaXR0bGUgYmVuZWZpdCB0byBleGFtaW5pbmcgbW9yZSB0aGFuIHNldmVuIG9yIHNvIHByaW5jaXBhbCBjb21wb25lbnRzICh0aG91Z2ggZXZlbiBleGFtaW5pbmcgc2V2ZW4gcHJpbmNpcGFsIGNvbXBvbmVudHMgbWF5IGJlIGRpZmZpY3VsdCkuDQoNCiMjIyBDbHVzdGVyaW5nIHRoZSBPYnNlcnZhdGlvbnMgb2YgdGhlIE5DSTYwIERhdGENCldlIG5vdyBwcm9jZWVkIHRvIGhpZXJhcmNoaWNhbGx5IGNsdXN0ZXIgdGhlIGNlbGwgbGluZXMgaW4gdGhlIGBOQ0k2MGAgZGF0YSwgd2l0aCB0aGUgZ29hbCBvZiBmaW5kaW5nIG91dCB3aGV0aGVyIG9yIG5vdCB0aGUgb2JzZXJ2YXRpb25zIGNsdXN0ZXIgaW50byBkaXN0aW5jdCB0eXBlcyBvZiBjYW5jZXIuIFRvIGJlZ2luLCB3ZSBzdGFuZGFyZGl6ZSB0aGUgdmFyaWFibGVzIHRvIGhhdmUgbWVhbiB6ZXJvIGFuZCBzdGFuZGFyZCBkZXZpYXRpb24gb25lLiBBcyBtZW50aW9uZWQgZWFybGllciwgdGhpcyBzdGVwIGlzIG9wdGlvbmFsIGFuZCBzaG91bGQgYmUgcGVyZm9ybWVkIG9ubHkgaWYgd2Ugd2FudCBlYWNoIGdlbmUgdG8gYmUgb24gdGhlIHNhbWUgc2NhbGUuDQoNCmBgYHtyfQ0Kc2QuZGF0YT1zY2FsZShuY2kuZGF0YSkNCmBgYA0KDQpXZSBub3cgcGVyZm9ybSBoaWVyYXJjaGljYWwgY2x1c3RlcmluZyBvZiB0aGUgb2JzZXJ2YXRpb25zIHVzaW5nIGNvbXBsZXRlLCBzaW5nbGUsIGFuZCBhdmVyYWdlIGxpbmthZ2UuIEV1Y2xpZGVhbiBkaXN0YW5jZSBpcyB1c2VkIGFzIHRoZSBkaXNzaW1pbGFyaXR5IG1lYXN1cmUuDQoNCmBgYHtyfQ0KcGFyKG1mcm93PWMoMSwzKSkNCmRhdGEuZGlzdD1kaXN0KHNkLmRhdGEpDQpwbG90KGhjbHVzdChkYXRhLmRpc3QpLCBsYWJlbHM9bmNpLmxhYnMgLCBtYWluPSJDb21wbGV0ZSBMaW5rYWdlICIsIHhsYWI9IiIsIHN1Yj0iIix5bGFiPSIiKQ0KcGxvdChoY2x1c3QoZGF0YS5kaXN0ICwgbWV0aG9kID0iYXZlcmFnZSIpLCBsYWJlbHM9bmNpLmxhYnMgLCBtYWluPSJBdmVyYWdlIExpbmthZ2UgIiwgeGxhYj0iIiwgc3ViPSIiLHlsYWI9IiIpDQpwbG90KGhjbHVzdChkYXRhLmRpc3QgLCBtZXRob2QgPSJzaW5nbGUiKSwgbGFiZWxzPW5jaS5sYWJzICwgbWFpbj0iU2luZ2xlIExpbmthZ2UgIiwgeGxhYj0iIiwgc3ViPSIiLHlsYWI9IiIpDQpgYGANCg0KVGhlIHJlc3VsdHMgYXJlIHNob3duIGluIEZpZ3VyZSAxMC4xNy4gV2Ugc2VlIHRoYXQgdGhlIGNob2ljZSBvZiBsaW5rYWdlIGNlcnRhaW5seSBkb2VzIGFmZmVjdCB0aGUgcmVzdWx0cyBvYnRhaW5lZC4gVHlwaWNhbGx5LCBzaW5nbGUgbGlua2FnZSB3aWxsIHRlbmQgdG8geWllbGQgKnRyYWlsaW5nKiBjbHVzdGVyczogdmVyeSBsYXJnZSBjbHVzdGVycyBvbnRvIHdoaWNoIGluZGl2aWR1YWwgb2JzZXJ2YXRpb25zIGF0dGFjaCBvbmUtYnktb25lLiBPbiB0aGUgb3RoZXIgaGFuZCwgY29tcGxldGUgYW5kIGF2ZXJhZ2UgbGlua2FnZSB0ZW5kIHRvIHlpZWxkIG1vcmUgYmFsYW5jZWQsIGF0dHJhY3RpdmUgY2x1c3RlcnMuIEZvciB0aGlzIHJlYXNvbiwgY29tcGxldGUgYW5kIGF2ZXJhZ2UgbGlua2FnZSBhcmUgZ2VuZXJhbGx5IHByZWZlcnJlZCB0byBzaW5nbGUgbGlua2FnZS4gQ2xlYXJseSBjZWxsIGxpbmVzIHdpdGhpbiBhIHNpbmdsZSBjYW5jZXIgdHlwZSBkbyB0ZW5kIHRvIGNsdXN0ZXIgdG9nZXRoZXIsIGFsdGhvdWdoIHRoZSBjbHVzdGVyaW5nIGlzIG5vdCBwZXJmZWN0LiBXZSB3aWxsIHVzZSBjb21wbGV0ZSBsaW5rYWdlIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIGZvciB0aGUgYW5hbHlzaXMgdGhhdCBmb2xsb3dzLg0KDQpXZSBjYW4gY3V0IHRoZSBkZW5kcm9ncmFtIGF0IHRoZSBoZWlnaHQgdGhhdCB3aWxsIHlpZWxkIGEgcGFydGljdWxhciBudW1iZXIgb2YgY2x1c3RlcnMsIHNheSBmb3VyOg0KDQpgYGB7cn0NCmhjLm91dD1oY2x1c3QoZGlzdChzZC5kYXRhKSkNCmhjLmNsdXN0ZXJzID1jdXRyZWUgKGhjLm91dCAsNCkNCnRhYmxlKGhjLmNsdXN0ZXJzICxuY2kubGFicykNCmBgYA0KDQpUaGVyZSBhcmUgc29tZSBjbGVhciBwYXR0ZXJucy4gQWxsIHRoZSBsZXVrZW1pYSBjZWxsIGxpbmVzIGZhbGwgaW4gY2x1c3RlciAzLCB3aGlsZSB0aGUgYnJlYXN0IGNhbmNlciBjZWxsIGxpbmVzIGFyZSBzcHJlYWQgb3V0IG92ZXIgdGhyZWUgZGlmZmVyZW50IGNsdXN0ZXJzLiBXZSBjYW4gcGxvdCB0aGUgY3V0IG9uIHRoZSBkZW5kcm9ncmFtIHRoYXQgcHJvZHVjZXMgdGhlc2UgZm91ciBjbHVzdGVyczoNCg0KYGBge3J9DQpwYXIobWZyb3c9YygxLDEpKQ0KcGxvdChoYy5vdXQgLCBsYWJlbHMgPW5jaS5sYWJzKQ0KYWJsaW5lKGg9MTM5LCBjb2w9InJlZCIpDQpgYGANCg0KVGhlIGBhYmxpbmUoKWAgZnVuY3Rpb24gZHJhd3MgYSBzdHJhaWdodCBsaW5lIG9uIHRvcCBvZiBhbnkgZXhpc3RpbmcgcGxvdCBpbiBgUmAuIFRoZSBhcmd1bWVudCBgaD0xMzlgIHBsb3RzIGEgaG9yaXpvbnRhbCBsaW5lIGF0IGhlaWdodCAxMzkgb24gdGhlIGRlbmRyb2dyYW07IHRoaXMgaXMgdGhlIGhlaWdodCB0aGF0IHJlc3VsdHMgaW4gZm91ciBkaXN0aW5jdCBjbHVzdGVycy4gSXQgaXMgZWFzeSB0byB2ZXJpZnkgdGhhdCB0aGUgcmVzdWx0aW5nIGNsdXN0ZXJzIGFyZSB0aGUgc2FtZSBhcyB0aGUgb25lcyB3ZSBvYnRhaW5lZCB1c2luZyBgY3V0cmVlKGhjLm91dCw0KWAuDQoNClByaW50aW5nIHRoZSBvdXRwdXQgb2YgaGNsdXN0IGdpdmVzIGEgdXNlZnVsIGJyaWVmIHN1bW1hcnkgb2YgdGhlIG9iamVjdDoNCg0KYGBge3J9DQpoYy5vdXQNCmBgYA0KDQpXZSBjbGFpbWVkIGVhcmxpZXIgaW4gU2VjdGlvbiAxMC4zLjIgdGhhdCBLLW1lYW5zIGNsdXN0ZXJpbmcgYW5kIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIHdpdGggdGhlIGRlbmRyb2dyYW0gY3V0IHRvIG9idGFpbiB0aGUgc2FtZSBudW1iZXIgb2YgY2x1c3RlcnMgY2FuIHlpZWxkIHZlcnkgZGlmZmVyZW50IHJlc3VsdHMuIEhvdyBkbyB0aGVzZSBgTkNJNjBgIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIHJlc3VsdHMgY29tcGFyZSB0byB3aGF0IHdlIGdldCBpZiB3ZSBwZXJmb3JtICpLKi1tZWFucyBjbHVzdGVyaW5nDQp3aXRoICpLKiA9IDQ/DQoNCmBgYHtyfQ0Kc2V0LnNlZWQoMikNCmttLm91dD1rbWVhbnMoc2QuZGF0YSAsIDQsIG5zdGFydCA9MjApDQprbS5jbHVzdGVycyA9a20ub3V0JGNsdXN0ZXINCnRhYmxlKGttLmNsdXN0ZXJzICxoYy5jbHVzdGVycykNCmBgYA0KDQpXZSBzZWUgdGhhdCB0aGUgZm91ciBjbHVzdGVycyBvYnRhaW5lZCB1c2luZyBoaWVyYXJjaGljYWwgY2x1c3RlcmluZyBhbmQgS21lYW5zIGNsdXN0ZXJpbmcgYXJlIHNvbWV3aGF0IGRpZmZlcmVudC4gQ2x1c3RlciAyIGluIEstbWVhbnMgY2x1c3RlcmluZyBpcyBpZGVudGljYWwgdG8gY2x1c3RlciAzIGluIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nLiBIb3dldmVyLCB0aGUgb3RoZXIgY2x1c3RlcnMgZGlmZmVyOiBmb3IgaW5zdGFuY2UsIGNsdXN0ZXIgNCBpbiBLLW1lYW5zIGNsdXN0ZXJpbmcgY29udGFpbnMgYSBwb3J0aW9uIG9mIHRoZSBvYnNlcnZhdGlvbnMgYXNzaWduZWQgdG8gY2x1c3RlciAxIGJ5IGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nLCBhcyB3ZWxsIGFzIGFsbCBvZiB0aGUgb2JzZXJ2YXRpb25zIGFzc2lnbmVkIHRvIGNsdXN0ZXIgMiBieSBoaWVyYXJjaGljYWwgY2x1c3RlcmluZy4NCg0KUmF0aGVyIHRoYW4gcGVyZm9ybWluZyBoaWVyYXJjaGljYWwgY2x1c3RlcmluZyBvbiB0aGUgZW50aXJlIGRhdGEgbWF0cml4LA0Kd2UgY2FuIHNpbXBseSBwZXJmb3JtIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIG9uIHRoZSBmaXJzdCBmZXcgcHJpbmNpcGFsDQpjb21wb25lbnQgc2NvcmUgdmVjdG9ycywgYXMgZm9sbG93czoNCg0KYGBge3J9DQpoYy5vdXQ9aGNsdXN0KGRpc3QocHIub3V0JHggWywxOjVdKSApDQpwbG90KGhjLm91dCAsIGxhYmVscyA9bmNpLmxhYnMgLCBtYWluPSJIaWVyLiBDbHVzdC4gb24gRmlyc3QgRml2ZSBTY29yZSBWZWN0b3JzICIpDQp0YWJsZShjdXRyZWUoaGMub3V0ICw0KSwgbmNpLmxhYnMpDQpgYGANCg0KTm90IHN1cnByaXNpbmdseSwgdGhlc2UgcmVzdWx0cyBhcmUgZGlmZmVyZW50IGZyb20gdGhlIG9uZXMgdGhhdCB3ZSBvYnRhaW5lZCB3aGVuIHdlIHBlcmZvcm1lZCBoaWVyYXJjaGljYWwgY2x1c3RlcmluZyBvbiB0aGUgZnVsbCBkYXRhIHNldC4gU29tZXRpbWVzIHBlcmZvcm1pbmcgY2x1c3RlcmluZyBvbiB0aGUgZmlyc3QgZmV3IHByaW5jaXBhbCBjb21wb25lbnQgc2NvcmUgdmVjdG9ycyBjYW4gZ2l2ZSBiZXR0ZXIgcmVzdWx0cyB0aGFuIHBlcmZvcm1pbmcgY2x1c3RlcmluZyBvbiB0aGUgZnVsbCBkYXRhLiBJbiB0aGlzIHNpdHVhdGlvbiwgd2UgbWlnaHQgdmlldyB0aGUgcHJpbmNpcGFsIGNvbXBvbmVudCBzdGVwIGFzIG9uZSBvZiBkZW5vaXNpbmcgdGhlIGRhdGEuIFdlIGNvdWxkIGFsc28gcGVyZm9ybSAqSyotbWVhbnMgY2x1c3RlcmluZyBvbiB0aGUgZmlyc3QgZmV3IHByaW5jaXBhbCBjb21wb25lbnQgc2NvcmUgdmVjdG9ycyByYXRoZXIgdGhhbiB0aGUgZnVsbCBkYXRhIHNldC4NCg0KDQo=