Author: Daniel R. Brown, Jr.

ALY6020 Predictive Analytics

Professor Stewart Huang

January 14th, 2018

Using k-NN algorithm for predictive classification

Discussion

Defining k-NN

According to Lantz, k-NN is an algorithm that “uses information about an example’s k-nearest neighbors to classify unlabeled examples. The letter k is a variable term implying that any number of nearest neighbors could be used. After choosing k, the algorithm requires a training dataset made up of examples that have been classified into several categories, as labeled by a nominal variable. Then, for each unlabeled record in the test dataset, k-NN identifies k records in the training data that are the ‘nearest’ in similarity. The unlabeled test instance is assigned the class of the majority of the k nearest neighbors.”(Lantz 2015)

That is to say that by using the k-NN algorithm you can determine classification of previous unclassified data by comparing the variance between multiple points. This variance can be obtained through various methods, like using Euclidean distance from geometry. The equation for euclidean distance \(d\) between two points \(p\) and \(q\) in \(n\)-dimensional space can be described by the following formula:

\[d \left(p,q\right)= \sqrt{\left(p_{1}-q_{1}\right)^2 + \left(p_{2}-q_{2}\right)^2 + ... +\left(p_{n}-q_{n}\right)^2}\] This formula is also known as pythagora’s theorem, and is also how the length of sides of a triangle can be calculated as in \(a^2 + b^2 = c^2\).

There are other methods of obtaining the variance between multiple points. For example, one can obtain the Pearson correlation \(\rho\), the Jackknife correlation \(\varrho\), the Goodman-Kruskal correlation \(\gamma\), and many others (Campello and Jaskowiak 2011).

Strengths of k-NN

k-NN is one of the most simple learning algorithms available in a data scientist’s toolkit. k-NN really shines when fast training of large datasets and simplicity are desired. It also does not make assumptions about the underlying data distribution of a dataset. Finally, it is an effective tool in predictive analytics that can quickly and easily be applied to datasets to allow for a quick check on the classification of the data.

Weakness of k-NN

k-NN is not without its weaknesses. For example, there is the ability to overfit the data. Overfitting is “[t]he production of an analysis which corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably”(Dictionary, n.d.). Using a large value for k can make the data classification ignore noise collected in the data, but this can be an instance of “throwing the baby out with the bathwater”; the algorithm could also be ignoring important patterns that can hide within the noise. This is known as bias-variance tradeoff. Because the k value must be chosen it can be difficult to select an appropriate value for k.

Nominal features, or features that have values that cannot necessarily be compared numerically to one another (like colors or labels), require additional processing prior to using the k-NN algorithm.

Another weakness exhibited by k-NN is despite having a relatively quick training phase, the classification phase takes a speed hit and is rather slow comparatively. This is due to the fact that no computation is done on the training data prior to classification and is known as lazy learning.

Finally, k-NN does not create a model of the data, which can make it difficult to understand what the classifications obtained actually mean.

Selecting an appropriate k

Sqaure Root of N method

As mentioned above, it can be difficult to select an appropriate value for k. According to Lantz, a good rule of thumb is \[ k = \sqrt{N} \] Where \(N\) is the number of training examples.

k-fold cross validation method

Another method to determine that appropriate value of k is known as k-fold cross validation. k-fold cross validation divides the data into k subsets, and the k-NN algorithm is applied k times. Each time, one of the subsets is held out and used as a training set. In this method, all of the data is used as a training set \(k-1\) times.(Schneider 1997). This method is more exact, but because the algorithm is being applied k times, the amount of time required to run it goes up by a factor of k. It is important to note that k for k-fold cross validation is different than the k in the k-NN algorithm. A value of \(k=10\) is typically chosen, so the method can also be called 10-fold cross validation.

Feature scaling

It is important to note that the features in a dataset might have very different ranges when compared to other features. If the distance formula was applied to unmodified features, there is a potential for the features with larger ranges to dominate or mask the features with smaller ranges. Because of this, it is important to prepare the data with feature scaling.

Min-Max Normalization

There are a few different ways that an analyst could scale the data being used. The traditional method for feature scaling used in k-NN classification is called min-max normalization. This method scales all of the values within a feature such that they fall between 0 and 1, and the equation representing this method is given as \[ X_{\textrm{new}}=\frac{X-\textrm{min}\left(X\right)}{\textrm{max}\left(X\right)-\textrm{min}\left(X\right)} \] This formula essentially takes any value \(X\) and subtracts the minimum value of that feature. I then divides this number by the difference of the largest value and the smallest value from that feature.

z-score Standardization

Another method to scale features is known as z-score standardization. z-score standardization uses the concept of a \(z\)-score, which essentially shows the values distance away from the feature set’s standard deviation, to scale the feature. The equation for scaling is given as

\[ X_{\textrm{new}}=\frac{X-\mu}{\sigma}=\frac{X-\bar{X}}{\textrm{StDev}\left(X\right)} =z \]

The new value of the feature \(X\) is rescaled to the \(z\)-score. Unlike in the min-max normalization seen above, these numbers can be positive or negative, and there is no minimum or maximum value that the new \(X\) must be between.

Dummy Coding

Another form of normalization that must be down is known as dummy coding, and is important for nominal features (features that are not numerical but categorical) Dummy coding takes all the categories in a feature and assigns them a value, 1 or 0, in their own new feature, 1 meaning that the feature contains that category, and 2 meaning that the feature does not. Then, you keep \(N-1\) of the new features, where \(N\) is the total number of categories. The same amount of data is still encoded in these new features because if it contains only 0, then it must be of the dropped category. In this way, all of the new features are either 1 or 0, which ensures that they are standardized the same as in min-max scaling.

Applying the k-NN Algorithm

According to Lantz, there are 5 steps required to be able to apply the k-NN algorithm to a data set. They are: 1. Data Collection 2. Data Exploration & Preparation 3. Model Training 4. Model Evaluation 5. Model Improvement.

Data Collection

The dataset that will be analyzed contains different measurements of cells from cancer biopsies like radius, texture, smoothness, perimeter, area, compactness, concavity, concave points, symmetry, and fractal dimension. This data was obtained by a minimally invasive procedure from 569 different patients (Olvi L. Mangasarian and Wolberg 1995).

Data Exploration and Preparation

First, we will begin by importing the dataset (In this case, the modified dataset provided with the textbook was used)

We can use the read.csv() function to import the dataset.

wbcd <- read.csv("wisc_bc_data.csv", stringsAsFactors = FALSE)

Following along with the textbook, we can subset the data, starting with removing the first feature (one that contains identification)

wbcd <- wbcd[-1]

Next week can make a table showing how many benign and malignant tumors there are. After that, we can show the percentage of each kind of tumor using prop.table()

table(wbcd$diagnosis)

  B   M 
357 212 
wbcd$diagnosis <- factor(wbcd$diagnosis, levels = c("B", "M"),
                         labels = c("Benign", "Malignant"))
round(prop.table(table(wbcd$diagnosis)) * 100, digits = 1)

   Benign Malignant 
     62.7      37.3 

We can also take a look at a summary of three of the features.

summary(wbcd[c("radius_mean", "area_mean", "smoothness_mean")])
  radius_mean       area_mean      smoothness_mean  
 Min.   : 6.981   Min.   : 143.5   Min.   :0.05263  
 1st Qu.:11.700   1st Qu.: 420.3   1st Qu.:0.08637  
 Median :13.370   Median : 551.1   Median :0.09587  
 Mean   :14.127   Mean   : 654.9   Mean   :0.09636  
 3rd Qu.:15.780   3rd Qu.: 782.7   3rd Qu.:0.10530  
 Max.   :28.110   Max.   :2501.0   Max.   :0.16340  

The above shows us that the features are all scaled differently and need to be normalized.

normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
wbcd_n <- as.data.frame(lapply(wbcd[2:31], normalize))
summary(wbcd_n$area_mean)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.1174  0.1729  0.2169  0.2711  1.0000 

The data has been normalized. Now we need to split the data into training and test sets.

wbcd_train <- wbcd_n[1:469, ]
wbcd_test <- wbcd_n[470:569, ]
wbcd_train_labels <- wbcd[1:469, 1]
wbcd_test_labels <- wbcd[470:569, 1]

Model Training

Now we can train the model for the data which was just prepared

wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test,
                      cl = wbcd_train_labels, k = 21)

Model Evaluation

Finally, we can take a look at the model we just created and determine how well it performed.

CrossTable(x = wbcd_test_labels, y = wbcd_test_pred,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  100 

 
                 | wbcd_test_pred 
wbcd_test_labels |    Benign | Malignant | Row Total | 
-----------------|-----------|-----------|-----------|
          Benign |        61 |         0 |        61 | 
                 |     1.000 |     0.000 |     0.610 | 
                 |     0.968 |     0.000 |           | 
                 |     0.610 |     0.000 |           | 
-----------------|-----------|-----------|-----------|
       Malignant |         2 |        37 |        39 | 
                 |     0.051 |     0.949 |     0.390 | 
                 |     0.032 |     1.000 |           | 
                 |     0.020 |     0.370 |           | 
-----------------|-----------|-----------|-----------|
    Column Total |        63 |        37 |       100 | 
                 |     0.630 |     0.370 |           | 
-----------------|-----------|-----------|-----------|

 

But how does it hold up? Let’s try a different method of standardization to find out.

Model Improvement

Now we can apply \(z\)-score standardization to the data and see if this makes a difference.

wbcd_z = as.data.frame(scale(wbcd[-1]))
wbcd_train <- wbcd_z[1:469, ]
wbcd_test <- wbcd_z[470:569, ]
wbcd_train_labels <- wbcd[1:469, 1]
wbcd_test_labels <- wbcd[470:569, 1]

After scaling we can perform the same methods that we did in training and evaluation.

wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test,
                      cl = wbcd_train_labels, k = 21)
CrossTable(x = wbcd_test_labels, y = wbcd_test_pred,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  100 

 
                 | wbcd_test_pred 
wbcd_test_labels |    Benign | Malignant | Row Total | 
-----------------|-----------|-----------|-----------|
          Benign |        61 |         0 |        61 | 
                 |     1.000 |     0.000 |     0.610 | 
                 |     0.924 |     0.000 |           | 
                 |     0.610 |     0.000 |           | 
-----------------|-----------|-----------|-----------|
       Malignant |         5 |        34 |        39 | 
                 |     0.128 |     0.872 |     0.390 | 
                 |     0.076 |     1.000 |           | 
                 |     0.050 |     0.340 |           | 
-----------------|-----------|-----------|-----------|
    Column Total |        66 |        34 |       100 | 
                 |     0.660 |     0.340 |           | 
-----------------|-----------|-----------|-----------|

 

Looks like our new model was a little less reliable than the first one. Further evaluation and improvement is required.

Further Improvement

We could go back to the first model that we created and modify some of the inputs to see if we get a better fit with different k values.

wbcd_train <- wbcd_n[1:469, ]
wbcd_test <- wbcd_n[470:569, ]
wbcd_train_labels <- wbcd[1:469, 1]
wbcd_test_labels <- wbcd[470:569, 1]

First, let’s attempt using an even number (This is usually not done in the case of ties)

wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test,
                      cl = wbcd_train_labels, k = 22)
CrossTable(x = wbcd_test_labels, y = wbcd_test_pred,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  100 

 
                 | wbcd_test_pred 
wbcd_test_labels |    Benign | Malignant | Row Total | 
-----------------|-----------|-----------|-----------|
          Benign |        61 |         0 |        61 | 
                 |     1.000 |     0.000 |     0.610 | 
                 |     0.953 |     0.000 |           | 
                 |     0.610 |     0.000 |           | 
-----------------|-----------|-----------|-----------|
       Malignant |         3 |        36 |        39 | 
                 |     0.077 |     0.923 |     0.390 | 
                 |     0.047 |     1.000 |           | 
                 |     0.030 |     0.360 |           | 
-----------------|-----------|-----------|-----------|
    Column Total |        64 |        36 |       100 | 
                 |     0.640 |     0.360 |           | 
-----------------|-----------|-----------|-----------|

 

These results are still not as good as the results obtained with the first model, despite having a larger value of k.

Next, we can try picking a value of k that is \(\frac{\sqrt{N}}{2}\), or 11.

wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test,
                      cl = wbcd_train_labels, k = 11)
CrossTable(x = wbcd_test_labels, y = wbcd_test_pred,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  100 

 
                 | wbcd_test_pred 
wbcd_test_labels |    Benign | Malignant | Row Total | 
-----------------|-----------|-----------|-----------|
          Benign |        61 |         0 |        61 | 
                 |     1.000 |     0.000 |     0.610 | 
                 |     0.953 |     0.000 |           | 
                 |     0.610 |     0.000 |           | 
-----------------|-----------|-----------|-----------|
       Malignant |         3 |        36 |        39 | 
                 |     0.077 |     0.923 |     0.390 | 
                 |     0.047 |     1.000 |           | 
                 |     0.030 |     0.360 |           | 
-----------------|-----------|-----------|-----------|
    Column Total |        64 |        36 |       100 | 
                 |     0.640 |     0.360 |           | 
-----------------|-----------|-----------|-----------|

 

And we obtain the same value as for \(k=22\). Finally, let’s double our initial k and subtract 1 (41).

wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test,
                      cl = wbcd_train_labels, k = 41)
CrossTable(x = wbcd_test_labels, y = wbcd_test_pred,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  100 

 
                 | wbcd_test_pred 
wbcd_test_labels |    Benign | Malignant | Row Total | 
-----------------|-----------|-----------|-----------|
          Benign |        61 |         0 |        61 | 
                 |     1.000 |     0.000 |     0.610 | 
                 |     0.938 |     0.000 |           | 
                 |     0.610 |     0.000 |           | 
-----------------|-----------|-----------|-----------|
       Malignant |         4 |        35 |        39 | 
                 |     0.103 |     0.897 |     0.390 | 
                 |     0.062 |     1.000 |           | 
                 |     0.040 |     0.350 |           | 
-----------------|-----------|-----------|-----------|
    Column Total |        65 |        35 |       100 | 
                 |     0.650 |     0.350 |           | 
-----------------|-----------|-----------|-----------|

 

It appears that our value has gone down. This is an interesting discovery!

Evaluating the Iris dataset

Now we can take a look at the Iris(Anderson 1936) dataset providied by the University of California Irvine’s machine learning dataset repository (Lichman 2013). This dataset provides a simple 150 point 5 feature dataset that can be classified easily using kNN. Below we will follow the steps provided by Lantz.

Data Preparation and Exploration - Iris

The Iris dataset is already included in R, so it will be easy to preproccess. The features included in the data set are Sepal Length, Sepal Width, Petal Length, Petal Width, and Species.

str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
table(iris$Species)

    setosa versicolor  virginica 
        50         50         50 
round(prop.table(table(factor(iris$Species, levels = c("setosa", 
                                                     "versicolor", 
                                                     "virginica"),
                              labels = c("Setosa", "Versicolor",
                                         "Virginica"))))
      * 100, digits = 1)

    Setosa Versicolor  Virginica 
      33.3       33.3       33.3 

Looking at the structure of the dataset using str() shows us that we need to normalize the data. This can be done with our normalization function created above.

iris_n <- as.data.frame(lapply(iris[1:4], normalize))
iris_n <- cbind(iris_n, iris[5])

Next, we can split the dataset into two sets, the test set and the training set. For this dataset, I used the R function sample() to randomly select 3/4 of the data to be used as a training set and then to use the remaining data as a test set to determine the validity of the model.

randomIris <- sample(nrow(iris), 112)
iris_test <- iris_n[-randomIris, ]
iris_train <- iris_n[randomIris,]

Next we can use the caret package to use k-fold cross validation. This method can give us some insight into what the best value for k in our kNN model will be.

set.seed(123)
ctrl <- trainControl(method = "repeatedcv", repeats = 5)
iris_pred <- train(Species ~ ., data = iris_train, method = "knn")
iris_pred
k-Nearest Neighbors 

112 samples
  4 predictors
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 112, 112, 112, 112, 112, 112, ... 
Resampling results across tuning parameters:

  k  Accuracy   Kappa    
  5  0.9436228  0.9146894
  7  0.9466854  0.9193047
  9  0.9466643  0.9192784

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 7.

What this tells us is that we should use the value \(9\) for k.

Training our model - iris

Finally, we can train the model using our training data.

iris_test_pred <- knn(train = iris_train[1:4], test = iris_test[1:4],
                      cl = iris_train$Species, k = 9)
CrossTable(x = iris_test$Species, y = iris_test_pred,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  38 

 
                  | iris_test_pred 
iris_test$Species |     setosa | versicolor |  virginica |  Row Total | 
------------------|------------|------------|------------|------------|
           setosa |         10 |          0 |          0 |         10 | 
                  |      1.000 |      0.000 |      0.000 |      0.263 | 
                  |      1.000 |      0.000 |      0.000 |            | 
                  |      0.263 |      0.000 |      0.000 |            | 
------------------|------------|------------|------------|------------|
       versicolor |          0 |         13 |          0 |         13 | 
                  |      0.000 |      1.000 |      0.000 |      0.342 | 
                  |      0.000 |      1.000 |      0.000 |            | 
                  |      0.000 |      0.342 |      0.000 |            | 
------------------|------------|------------|------------|------------|
        virginica |          0 |          0 |         15 |         15 | 
                  |      0.000 |      0.000 |      1.000 |      0.395 | 
                  |      0.000 |      0.000 |      1.000 |            | 
                  |      0.000 |      0.000 |      0.395 |            | 
------------------|------------|------------|------------|------------|
     Column Total |         10 |         13 |         15 |         38 | 
                  |      0.263 |      0.342 |      0.395 |            | 
------------------|------------|------------|------------|------------|

 

According to the cross table, the predictions we got were extremely close for setosa, but markedly less so for versicolor or virginica. Perhaps we could use the thumbrule of \(\sqrt{N}\) to evaluate our model further.

Tuning our model - iris

\(\sqrt{150}\approx 13\). Let’s attempt our model using the number 13 to see what results we obtain.

iris_test_pred2 <- knn(train = iris_train[1:4], test = iris_test[1:4],
                      cl = iris_train$Species, k = 13)
CrossTable(x = iris_test$Species, y = iris_test_pred2,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  38 

 
                  | iris_test_pred2 
iris_test$Species |     setosa | versicolor |  virginica |  Row Total | 
------------------|------------|------------|------------|------------|
           setosa |         10 |          0 |          0 |         10 | 
                  |      1.000 |      0.000 |      0.000 |      0.263 | 
                  |      1.000 |      0.000 |      0.000 |            | 
                  |      0.263 |      0.000 |      0.000 |            | 
------------------|------------|------------|------------|------------|
       versicolor |          0 |         13 |          0 |         13 | 
                  |      0.000 |      1.000 |      0.000 |      0.342 | 
                  |      0.000 |      0.929 |      0.000 |            | 
                  |      0.000 |      0.342 |      0.000 |            | 
------------------|------------|------------|------------|------------|
        virginica |          0 |          1 |         14 |         15 | 
                  |      0.000 |      0.067 |      0.933 |      0.395 | 
                  |      0.000 |      0.071 |      1.000 |            | 
                  |      0.000 |      0.026 |      0.368 |            | 
------------------|------------|------------|------------|------------|
     Column Total |         10 |         14 |         14 |         38 | 
                  |      0.263 |      0.368 |      0.368 |            | 
------------------|------------|------------|------------|------------|

 

The obtained results are the same as those obtained before.

Third attempt at tuning

Let’s try \(k\approx\frac{\sqrt{N}}{2}\), which gives us \(7\)

iris_test_pred <- knn(train = iris_train[1:4], test = iris_test[1:4],
                      cl = iris_train$Species, k = 7)
CrossTable(x = iris_test$Species, y = iris_test_pred,
           prop.chisq = FALSE)

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  38 

 
                  | iris_test_pred 
iris_test$Species |     setosa | versicolor |  virginica |  Row Total | 
------------------|------------|------------|------------|------------|
           setosa |         10 |          0 |          0 |         10 | 
                  |      1.000 |      0.000 |      0.000 |      0.263 | 
                  |      1.000 |      0.000 |      0.000 |            | 
                  |      0.263 |      0.000 |      0.000 |            | 
------------------|------------|------------|------------|------------|
       versicolor |          0 |         12 |          1 |         13 | 
                  |      0.000 |      0.923 |      0.077 |      0.342 | 
                  |      0.000 |      1.000 |      0.062 |            | 
                  |      0.000 |      0.316 |      0.026 |            | 
------------------|------------|------------|------------|------------|
        virginica |          0 |          0 |         15 |         15 | 
                  |      0.000 |      0.000 |      1.000 |      0.395 | 
                  |      0.000 |      0.000 |      0.938 |            | 
                  |      0.000 |      0.000 |      0.395 |            | 
------------------|------------|------------|------------|------------|
     Column Total |         10 |         12 |         16 |         38 | 
                  |      0.263 |      0.316 |      0.421 |            | 
------------------|------------|------------|------------|------------|

 

This model gives us the best results by far. Sometimes simplicity is best. ## References

Anderson, Edgar. 1936. “The Species Problem in Iris.” Annals of the Missouri Botanical Garden. doi:10.2307/2394164.

Campello, Ricardo J. G. B., and Pablo A. Jaskowiak. 2011. “Comparing Correlation Coefficients as Dissimilarity Measures for Cancer Classification in Gene Expression Data.” Brazilian Symposium on Biometrics. doi:10.1.1.208.993.

Dictionary, English Oxford Living. n.d. “Definition of Overfitting.” https://en.oxforddictionaries.com/definition/overfitting.

Lantz, Brett. 2015. Machine Learning with R. Birmingham, United Kingdom: Packt Publishing.

Lichman, M. 2013. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

Olvi L. Mangasarian, W. Nick Street, and William H. Wolberg. 1995. “Breast Cancer Diagnosis and Prognosis via Linear Programming.” https://doi.org/10.1287/opre.43.4.570.

Schneider, Jeff. 1997. “Cross Validation.” https://www.cs.cmu.edu/~schneide/tut5/node42.html.

LS0tCnRpdGxlOiAiTW9kdWxlIDEiCm91dHB1dDoKICBodG1sX25vdGVib29rOiBkZWZhdWx0CmJpYmxpb2dyYXBoeTogcmVmLmJpYgotLS0KIyMjIyBBdXRob3I6IERhbmllbCBSLiBCcm93biwgSnIuCiMjIyMgQUxZNjAyMCBQcmVkaWN0aXZlIEFuYWx5dGljcwojIyMjIFByb2Zlc3NvciBTdGV3YXJ0IEh1YW5nCiMjIyMgSmFudWFyeSAxNHRoLCAyMDE4CgojIFVzaW5nIGstTk4gYWxnb3JpdGhtIGZvciBwcmVkaWN0aXZlIGNsYXNzaWZpY2F0aW9uCiMjIERpc2N1c3Npb24KIyMjIERlZmluaW5nIGstTk4KQWNjb3JkaW5nIHRvIExhbnR6LCAqKmstTk4qKiBpcyBhbiBhbGdvcml0aG0gdGhhdCAidXNlcyBpbmZvcm1hdGlvbiBhYm91dAphbiBleGFtcGxlJ3MgKmsqLW5lYXJlc3QgbmVpZ2hib3JzIHRvIGNsYXNzaWZ5IHVubGFiZWxlZCBleGFtcGxlcy4gVGhlIGxldHRlciAqayogaXMgYQp2YXJpYWJsZSB0ZXJtIGltcGx5aW5nIHRoYXQgYW55IG51bWJlciBvZiBuZWFyZXN0IG5laWdoYm9ycyBjb3VsZCBiZSB1c2VkLiBBZnRlcgpjaG9vc2luZyAqayosIHRoZSBhbGdvcml0aG0gcmVxdWlyZXMgYSB0cmFpbmluZyBkYXRhc2V0IG1hZGUgdXAgb2YgZXhhbXBsZXMgdGhhdCBoYXZlCmJlZW4gY2xhc3NpZmllZCBpbnRvIHNldmVyYWwgY2F0ZWdvcmllcywgYXMgbGFiZWxlZCBieSBhIG5vbWluYWwgdmFyaWFibGUuIFRoZW4sIGZvcgplYWNoIHVubGFiZWxlZCByZWNvcmQgaW4gdGhlIHRlc3QgZGF0YXNldCwgay1OTiBpZGVudGlmaWVzICprKiByZWNvcmRzIGluIHRoZSB0cmFpbmluZwpkYXRhIHRoYXQgYXJlIHRoZSAnbmVhcmVzdCcgaW4gc2ltaWxhcml0eS4gVGhlIHVubGFiZWxlZCB0ZXN0IGluc3RhbmNlIGlzIGFzc2lnbmVkIHRoZQpjbGFzcyBvZiB0aGUgbWFqb3JpdHkgb2YgdGhlICprKiBuZWFyZXN0IG5laWdoYm9ycy4iW0BsYW50ejE1XQoKVGhhdCBpcyB0byBzYXkgdGhhdCBieSB1c2luZyB0aGUgay1OTiBhbGdvcml0aG0geW91IGNhbiBkZXRlcm1pbmUgY2xhc3NpZmljYXRpb24gb2YKcHJldmlvdXMgdW5jbGFzc2lmaWVkIGRhdGEgYnkgY29tcGFyaW5nIHRoZSB2YXJpYW5jZSBiZXR3ZWVuIG11bHRpcGxlIHBvaW50cy4gIFRoaXMKdmFyaWFuY2UgY2FuIGJlIG9idGFpbmVkIHRocm91Z2ggdmFyaW91cyBtZXRob2RzLCBsaWtlIHVzaW5nICpFdWNsaWRlYW4qIGRpc3RhbmNlIGZyb20gZ2VvbWV0cnkuClRoZSBlcXVhdGlvbiBmb3IgZXVjbGlkZWFuIGRpc3RhbmNlICRkJCAgYmV0d2VlbiB0d28gcG9pbnRzICRwJCBhbmQgJHEkIGluICRuJC1kaW1lbnNpb25hbApzcGFjZSBjYW4gYmUgZGVzY3JpYmVkIGJ5IHRoZSBmb2xsb3dpbmcgZm9ybXVsYToKCiQkZCBcbGVmdChwLHFccmlnaHQpPSBcc3FydHtcbGVmdChwX3sxfS1xX3sxfVxyaWdodCleMiArIFxsZWZ0KHBfezJ9LXFfezJ9XHJpZ2h0KV4yICArIC4uLiArXGxlZnQocF97bn0tcV97bn1ccmlnaHQpXjJ9JCQKVGhpcyBmb3JtdWxhIGlzIGFsc28ga25vd24gYXMgKnB5dGhhZ29yYSdzIHRoZW9yZW0qLCBhbmQgaXMgYWxzbyBob3cgdGhlIGxlbmd0aCBvZiBzaWRlcwpvZiBhIHRyaWFuZ2xlIGNhbiBiZSBjYWxjdWxhdGVkIGFzIGluICRhXjIgKyBiXjIgPSBjXjIkLgoKVGhlcmUgYXJlIG90aGVyIG1ldGhvZHMgb2Ygb2J0YWluaW5nIHRoZSB2YXJpYW5jZSBiZXR3ZWVuIG11bHRpcGxlIHBvaW50cy4gRm9yIGV4YW1wbGUsIApvbmUgY2FuIG9idGFpbiB0aGUgKlBlYXJzb24gY29ycmVsYXRpb24qICRccmhvJCwgdGhlICpKYWNra25pZmUgY29ycmVsYXRpb24qICRcdmFycmhvJCwKdGhlICpHb29kbWFuLUtydXNrYWwgY29ycmVsYXRpb24qICRcZ2FtbWEkLCBhbmQgbWFueSBvdGhlcnMgW0BjYW1wZTExXS4KCiMjIyBTdHJlbmd0aHMgb2Ygay1OTgprLU5OIGlzIG9uZSBvZiB0aGUgbW9zdCBzaW1wbGUgbGVhcm5pbmcgYWxnb3JpdGhtcyBhdmFpbGFibGUgaW4gYSBkYXRhIHNjaWVudGlzdCdzIHRvb2xraXQuCmstTk4gcmVhbGx5IHNoaW5lcyB3aGVuIGZhc3QgdHJhaW5pbmcgb2YgbGFyZ2UgZGF0YXNldHMgYW5kIHNpbXBsaWNpdHkgYXJlIGRlc2lyZWQuIEl0IGFsc28gZG9lcyAKbm90IG1ha2UgYXNzdW1wdGlvbnMgYWJvdXQgdGhlIHVuZGVybHlpbmcgZGF0YSBkaXN0cmlidXRpb24gb2YgYSBkYXRhc2V0LiBGaW5hbGx5LCBpdCBpcyBhbiBlZmZlY3RpdmUKdG9vbCBpbiBwcmVkaWN0aXZlIGFuYWx5dGljcyB0aGF0IGNhbiBxdWlja2x5IGFuZCBlYXNpbHkgYmUgYXBwbGllZCB0byBkYXRhc2V0cyB0byBhbGxvdwpmb3IgYSBxdWljayBjaGVjayBvbiB0aGUgY2xhc3NpZmljYXRpb24gb2YgdGhlIGRhdGEuCgojIyMgV2Vha25lc3Mgb2Ygay1OTgprLU5OIGlzIG5vdCB3aXRob3V0IGl0cyB3ZWFrbmVzc2VzLiAgRm9yIGV4YW1wbGUsIHRoZXJlIGlzIHRoZSBhYmlsaXR5IHRvICoqb3ZlcmZpdCoqIHRoZSBkYXRhLgpPdmVyZml0dGluZyBpcyAiW3RdaGUgcHJvZHVjdGlvbiBvZiBhbiBhbmFseXNpcyB3aGljaCBjb3JyZXNwb25kcyB0b28gY2xvc2VseSBvciBleGFjdGx5IHRvIGEgcGFydGljdWxhciBzZXQgb2YgZGF0YSwgYW5kIG1heSB0aGVyZWZvcmUgZmFpbCB0byBmaXQgYWRkaXRpb25hbCBkYXRhIG9yIHByZWRpY3QgZnV0dXJlIG9ic2VydmF0aW9ucyByZWxpYWJseSJbQG92ZXJmaXR0aW5nMThdLgpVc2luZyBhIGxhcmdlIHZhbHVlIGZvciAqayogY2FuIG1ha2UgdGhlIGRhdGEgY2xhc3NpZmljYXRpb24gaWdub3JlIG5vaXNlIGNvbGxlY3RlZCBpbiB0aGUgZGF0YSwgYnV0IHRoaXMKY2FuIGJlIGFuIGluc3RhbmNlIG9mICJ0aHJvd2luZyB0aGUgYmFieSBvdXQgd2l0aCB0aGUgYmF0aHdhdGVyIjsgdGhlIGFsZ29yaXRobSBjb3VsZCBhbHNvIGJlIGlnbm9yaW5nCmltcG9ydGFudCBwYXR0ZXJucyB0aGF0IGNhbiBoaWRlIHdpdGhpbiB0aGUgbm9pc2UuIFRoaXMgaXMga25vd24gYXMgKipiaWFzLXZhcmlhbmNlIHRyYWRlb2ZmKiouIEJlY2F1c2UgdGhlICprKiB2YWx1ZSBtdXN0IGJlIGNob3NlbiBpdCBjYW4gYmUgZGlmZmljdWx0IHRvIHNlbGVjdCBhbiBhcHByb3ByaWF0ZSB2YWx1ZSBmb3IgKmsqLgoKTm9taW5hbCBmZWF0dXJlcywgb3IgZmVhdHVyZXMgdGhhdCBoYXZlIHZhbHVlcyB0aGF0IGNhbm5vdCBuZWNlc3NhcmlseSBiZSBjb21wYXJlZCBudW1lcmljYWxseSB0byBvbmUKYW5vdGhlciAobGlrZSBjb2xvcnMgb3IgbGFiZWxzKSwgcmVxdWlyZSBhZGRpdGlvbmFsIHByb2Nlc3NpbmcgcHJpb3IgdG8gdXNpbmcgdGhlIGstTk4gYWxnb3JpdGhtLgoKQW5vdGhlciB3ZWFrbmVzcyBleGhpYml0ZWQgYnkgay1OTiBpcyBkZXNwaXRlIGhhdmluZyBhIHJlbGF0aXZlbHkgcXVpY2sgdHJhaW5pbmcgcGhhc2UsIHRoZSBjbGFzc2lmaWNhdGlvbgpwaGFzZSB0YWtlcyBhIHNwZWVkIGhpdCBhbmQgaXMgcmF0aGVyIHNsb3cgY29tcGFyYXRpdmVseS4gVGhpcyBpcyBkdWUgdG8gdGhlIGZhY3QgdGhhdCBubyBjb21wdXRhdGlvbiBpcyAKZG9uZSBvbiB0aGUgdHJhaW5pbmcgZGF0YSBwcmlvciB0byBjbGFzc2lmaWNhdGlvbiBhbmQgaXMga25vd24gYXMgKipsYXp5IGxlYXJuaW5nKiouIAoKRmluYWxseSwgay1OTiBkb2VzIG5vdCBjcmVhdGUgYSBtb2RlbCBvZiB0aGUgZGF0YSwgd2hpY2ggY2FuIG1ha2UgaXQgZGlmZmljdWx0IHRvIHVuZGVyc3RhbmQgd2hhdCB0aGUgY2xhc3NpZmljYXRpb25zIG9idGFpbmVkIGFjdHVhbGx5IG1lYW4uCgojIyMgU2VsZWN0aW5nIGFuIGFwcHJvcHJpYXRlICprKgojIyBTcWF1cmUgUm9vdCBvZiBOIG1ldGhvZApBcyBtZW50aW9uZWQgYWJvdmUsIGl0IGNhbiBiZSBkaWZmaWN1bHQgdG8gc2VsZWN0IGFuIGFwcHJvcHJpYXRlIHZhbHVlIGZvciAqayouIEFjY29yZGluZyB0byBMYW50eiwgYQpnb29kIHJ1bGUgb2YgdGh1bWIgaXMKJCQKayA9IFxzcXJ0e059CiQkCldoZXJlICROJCBpcyB0aGUgbnVtYmVyIG9mIHRyYWluaW5nIGV4YW1wbGVzLgoKIyMgay1mb2xkIGNyb3NzIHZhbGlkYXRpb24gbWV0aG9kCkFub3RoZXIgbWV0aG9kIHRvIGRldGVybWluZSB0aGF0IGFwcHJvcHJpYXRlIHZhbHVlIG9mICprKiBpcyBrbm93biBhcyAqKmstZm9sZCBjcm9zcyB2YWxpZGF0aW9uKiouCmstZm9sZCBjcm9zcyB2YWxpZGF0aW9uIGRpdmlkZXMgdGhlIGRhdGEgaW50byAqayogc3Vic2V0cywgYW5kIHRoZSBrLU5OIGFsZ29yaXRobSBpcyBhcHBsaWVkICprKiB0aW1lcy4KRWFjaCB0aW1lLCBvbmUgb2YgdGhlIHN1YnNldHMgaXMgaGVsZCBvdXQgYW5kIHVzZWQgYXMgYSB0cmFpbmluZyBzZXQuIEluIHRoaXMgbWV0aG9kLCBhbGwgb2YgdGhlIGRhdGEgaXMKdXNlZCBhcyBhIHRyYWluaW5nIHNldCAkay0xJCB0aW1lcy5bQHNjaG5laWRlcjk3XS4gVGhpcyBtZXRob2QgaXMgbW9yZSBleGFjdCwgYnV0IGJlY2F1c2UgdGhlIGFsZ29yaXRobSBpcwpiZWluZyBhcHBsaWVkICprKiB0aW1lcywgdGhlIGFtb3VudCBvZiB0aW1lIHJlcXVpcmVkIHRvIHJ1biBpdCBnb2VzIHVwIGJ5IGEgZmFjdG9yIG9mICprKi4gSXQgaXMgaW1wb3J0YW50IHRvIG5vdGUgdGhhdCAqayogZm9yICprKi1mb2xkIGNyb3NzIHZhbGlkYXRpb24gaXMgZGlmZmVyZW50IHRoYW4gdGhlICprKiBpbiB0aGUgay1OTiBhbGdvcml0aG0uIEEgdmFsdWUgb2YgJGs9MTAkIGlzIHR5cGljYWxseSBjaG9zZW4sIHNvIHRoZSBtZXRob2QgY2FuIGFsc28gYmUgY2FsbGVkICoqMTAtZm9sZCBjcm9zcyB2YWxpZGF0aW9uKiouCiAKIyMgRmVhdHVyZSBzY2FsaW5nCkl0IGlzIGltcG9ydGFudCB0byBub3RlIHRoYXQgdGhlIGZlYXR1cmVzIGluIGEgZGF0YXNldCBtaWdodCBoYXZlIHZlcnkgZGlmZmVyZW50IHJhbmdlcyB3aGVuIGNvbXBhcmVkIHRvIG90aGVyCmZlYXR1cmVzLiBJZiB0aGUgZGlzdGFuY2UgZm9ybXVsYSB3YXMgYXBwbGllZCB0byB1bm1vZGlmaWVkIGZlYXR1cmVzLCB0aGVyZSBpcyBhIHBvdGVudGlhbCBmb3IgdGhlIGZlYXR1cmVzCndpdGggbGFyZ2VyIHJhbmdlcyB0byBkb21pbmF0ZSBvciBtYXNrIHRoZSBmZWF0dXJlcyB3aXRoIHNtYWxsZXIgcmFuZ2VzLiBCZWNhdXNlIG9mIHRoaXMsIGl0IGlzIGltcG9ydGFudCB0bwpwcmVwYXJlIHRoZSBkYXRhIHdpdGggKipmZWF0dXJlIHNjYWxpbmcqKi4KCiMjIyBNaW4tTWF4IE5vcm1hbGl6YXRpb24KVGhlcmUgYXJlIGEgZmV3IGRpZmZlcmVudCB3YXlzIHRoYXQgYW4gYW5hbHlzdCBjb3VsZCBzY2FsZSB0aGUgZGF0YSBiZWluZyB1c2VkLiBUaGUgdHJhZGl0aW9uYWwgbWV0aG9kIGZvcgpmZWF0dXJlIHNjYWxpbmcgdXNlZCBpbiBrLU5OIGNsYXNzaWZpY2F0aW9uIGlzIGNhbGxlZCAqKm1pbi1tYXggbm9ybWFsaXphdGlvbioqLiBUaGlzIG1ldGhvZCBzY2FsZXMgYWxsIG9mCnRoZSB2YWx1ZXMgd2l0aGluIGEgZmVhdHVyZSBzdWNoIHRoYXQgdGhleSBmYWxsIGJldHdlZW4gMCBhbmQgMSwgYW5kIHRoZSBlcXVhdGlvbiByZXByZXNlbnRpbmcgdGhpcyBtZXRob2QKaXMgZ2l2ZW4gYXMKJCQKWF97XHRleHRybXtuZXd9fT1cZnJhY3tYLVx0ZXh0cm17bWlufVxsZWZ0KFhccmlnaHQpfXtcdGV4dHJte21heH1cbGVmdChYXHJpZ2h0KS1cdGV4dHJte21pbn1cbGVmdChYXHJpZ2h0KX0KJCQKVGhpcyBmb3JtdWxhIGVzc2VudGlhbGx5IHRha2VzIGFueSB2YWx1ZSAkWCQgYW5kIHN1YnRyYWN0cyB0aGUgbWluaW11bSB2YWx1ZSBvZiB0aGF0IGZlYXR1cmUuIEkgdGhlbiBkaXZpZGVzIHRoaXMgbnVtYmVyIGJ5IHRoZSBkaWZmZXJlbmNlIG9mIHRoZSBsYXJnZXN0IHZhbHVlIGFuZCB0aGUgc21hbGxlc3QgdmFsdWUgZnJvbSB0aGF0IGZlYXR1cmUuIAoKIyMjIHotc2NvcmUgU3RhbmRhcmRpemF0aW9uCkFub3RoZXIgbWV0aG9kIHRvIHNjYWxlIGZlYXR1cmVzIGlzIGtub3duIGFzICoqei1zY29yZSBzdGFuZGFyZGl6YXRpb24qKi4gKnotc2NvcmUqIHN0YW5kYXJkaXphdGlvbiB1c2VzCnRoZSBjb25jZXB0IG9mIGEgJHokLXNjb3JlLCB3aGljaCBlc3NlbnRpYWxseSBzaG93cyB0aGUgdmFsdWVzIGRpc3RhbmNlIGF3YXkgZnJvbSB0aGUgZmVhdHVyZSBzZXQncyAKc3RhbmRhcmQgZGV2aWF0aW9uLCB0byBzY2FsZSB0aGUgZmVhdHVyZS4gVGhlIGVxdWF0aW9uIGZvciBzY2FsaW5nIGlzIGdpdmVuIGFzIAoKJCQKWF97XHRleHRybXtuZXd9fT1cZnJhY3tYLVxtdX17XHNpZ21hfT1cZnJhY3tYLVxiYXJ7WH19e1x0ZXh0cm17U3REZXZ9XGxlZnQoWFxyaWdodCl9ID16CiQkCgpUaGUgbmV3IHZhbHVlIG9mIHRoZSBmZWF0dXJlICRYJCBpcyByZXNjYWxlZCB0byB0aGUgJHokLXNjb3JlLiBVbmxpa2UgaW4gdGhlIG1pbi1tYXggbm9ybWFsaXphdGlvbiBzZWVuIGFib3ZlLCAKdGhlc2UgbnVtYmVycyBjYW4gYmUgcG9zaXRpdmUgb3IgbmVnYXRpdmUsIGFuZCB0aGVyZSBpcyBubyBtaW5pbXVtIG9yIG1heGltdW0gdmFsdWUgdGhhdCB0aGUgbmV3ICRYJCBtdXN0IGJlIGJldHdlZW4uCgojIyMgRHVtbXkgQ29kaW5nCkFub3RoZXIgZm9ybSBvZiBub3JtYWxpemF0aW9uIHRoYXQgbXVzdCBiZSBkb3duIGlzIGtub3duIGFzICpkdW1teSBjb2RpbmcqLCBhbmQgaXMgaW1wb3J0YW50IGZvciAKbm9taW5hbCBmZWF0dXJlcyAoZmVhdHVyZXMgdGhhdCBhcmUgbm90IG51bWVyaWNhbCBidXQgY2F0ZWdvcmljYWwpIER1bW15IGNvZGluZyB0YWtlcyBhbGwgdGhlIGNhdGVnb3JpZXMKaW4gYSBmZWF0dXJlIGFuZCBhc3NpZ25zIHRoZW0gYSB2YWx1ZSwgMSBvciAwLCBpbiB0aGVpciBvd24gbmV3IGZlYXR1cmUsIDEgbWVhbmluZyB0aGF0IHRoZSBmZWF0dXJlCmNvbnRhaW5zIHRoYXQgY2F0ZWdvcnksIGFuZCAyIG1lYW5pbmcgdGhhdCB0aGUgZmVhdHVyZSBkb2VzIG5vdC4gVGhlbiwgeW91IGtlZXAgJE4tMSQgb2YgdGhlIG5ldyBmZWF0dXJlcywKd2hlcmUgJE4kIGlzIHRoZSB0b3RhbCBudW1iZXIgb2YgY2F0ZWdvcmllcy4gVGhlIHNhbWUgYW1vdW50IG9mIGRhdGEgaXMgc3RpbGwgZW5jb2RlZCBpbiB0aGVzZSBuZXcKZmVhdHVyZXMgYmVjYXVzZSBpZiBpdCBjb250YWlucyBvbmx5IDAsIHRoZW4gaXQgbXVzdCBiZSBvZiB0aGUgZHJvcHBlZCBjYXRlZ29yeS4gSW4gdGhpcyB3YXksCmFsbCBvZiB0aGUgbmV3IGZlYXR1cmVzIGFyZSBlaXRoZXIgMSBvciAwLCB3aGljaCBlbnN1cmVzIHRoYXQgdGhleSBhcmUgc3RhbmRhcmRpemVkIHRoZSBzYW1lIGFzIGluCm1pbi1tYXggc2NhbGluZy4KCiMjIEFwcGx5aW5nIHRoZSBrLU5OIEFsZ29yaXRobQpBY2NvcmRpbmcgdG8gTGFudHosIHRoZXJlIGFyZSA1IHN0ZXBzIHJlcXVpcmVkIHRvIGJlIGFibGUgdG8gYXBwbHkgdGhlIGstTk4gYWxnb3JpdGhtIHRvIGEgZGF0YSBzZXQuClRoZXkgYXJlOgoxLiBEYXRhIENvbGxlY3Rpb24KMi4gRGF0YSBFeHBsb3JhdGlvbiAmIFByZXBhcmF0aW9uCjMuIE1vZGVsIFRyYWluaW5nCjQuIE1vZGVsIEV2YWx1YXRpb24KNS4gTW9kZWwgSW1wcm92ZW1lbnQuCgojIyMgRGF0YSBDb2xsZWN0aW9uClRoZSBkYXRhc2V0IHRoYXQgd2lsbCBiZSBhbmFseXplZCBjb250YWlucyBkaWZmZXJlbnQgbWVhc3VyZW1lbnRzIG9mIGNlbGxzIGZyb20gY2FuY2VyIGJpb3BzaWVzIGxpa2UKcmFkaXVzLCB0ZXh0dXJlLCBzbW9vdGhuZXNzLCBwZXJpbWV0ZXIsIGFyZWEsIGNvbXBhY3RuZXNzLCBjb25jYXZpdHksIGNvbmNhdmUgcG9pbnRzLCBzeW1tZXRyeSwgYW5kIApmcmFjdGFsIGRpbWVuc2lvbi4gVGhpcyBkYXRhIHdhcyBvYnRhaW5lZCBieSBhIG1pbmltYWxseSBpbnZhc2l2ZSBwcm9jZWR1cmUgZnJvbSA1NjkgZGlmZmVyZW50CnBhdGllbnRzIFtAbWFuZ2E5NV0uCgojIyMgRGF0YSBFeHBsb3JhdGlvbiBhbmQgUHJlcGFyYXRpb24KCkZpcnN0LCB3ZSB3aWxsIGJlZ2luIGJ5IGltcG9ydGluZyB0aGUgZGF0YXNldCAoSW4gdGhpcyBjYXNlLCB0aGUgbW9kaWZpZWQgZGF0YXNldCBwcm92aWRlZCB3aXRoIHRoZQp0ZXh0Ym9vayB3YXMgdXNlZCkKYGBge3IgSW1wb3J0aW5nIExpYnJhcmllcywgaW5jbHVkZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoY2xhc3MpCmxpYnJhcnkoZ21vZGVscykKbGlicmFyeShjYXJldCkKbGlicmFyeShlMTA3MSkKYGBgCgpXZSBjYW4gdXNlIHRoZSBgcmVhZC5jc3YoKWAgZnVuY3Rpb24gdG8gaW1wb3J0IHRoZSBkYXRhc2V0LgpgYGB7cn0Kd2JjZCA8LSByZWFkLmNzdigid2lzY19iY19kYXRhLmNzdiIsIHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRSkKYGBgCgpGb2xsb3dpbmcgYWxvbmcgd2l0aCB0aGUgdGV4dGJvb2ssIHdlIGNhbiBzdWJzZXQgdGhlIGRhdGEsIHN0YXJ0aW5nIHdpdGgKcmVtb3ZpbmcgdGhlIGZpcnN0IGZlYXR1cmUgKG9uZSB0aGF0IGNvbnRhaW5zIGlkZW50aWZpY2F0aW9uKQoKYGBge3J9CndiY2QgPC0gd2JjZFstMV0KYGBgCgpOZXh0IHdlZWsgY2FuIG1ha2UgYSB0YWJsZSBzaG93aW5nIGhvdyBtYW55IGJlbmlnbiBhbmQgbWFsaWduYW50IHR1bW9ycyB0aGVyZQphcmUuIEFmdGVyIHRoYXQsIHdlIGNhbiBzaG93IHRoZSBwZXJjZW50YWdlIG9mIGVhY2gga2luZCBvZiB0dW1vciB1c2luZyBgcHJvcC50YWJsZSgpYApgYGB7cn0KdGFibGUod2JjZCRkaWFnbm9zaXMpCndiY2QkZGlhZ25vc2lzIDwtIGZhY3Rvcih3YmNkJGRpYWdub3NpcywgbGV2ZWxzID0gYygiQiIsICJNIiksCiAgICAgICAgICAgICAgICAgICAgICAgICBsYWJlbHMgPSBjKCJCZW5pZ24iLCAiTWFsaWduYW50IikpCnJvdW5kKHByb3AudGFibGUodGFibGUod2JjZCRkaWFnbm9zaXMpKSAqIDEwMCwgZGlnaXRzID0gMSkKYGBgCgpXZSBjYW4gYWxzbyB0YWtlIGEgbG9vayBhdCBhIHN1bW1hcnkgb2YgdGhyZWUgb2YgdGhlIGZlYXR1cmVzLgpgYGB7cn0Kc3VtbWFyeSh3YmNkW2MoInJhZGl1c19tZWFuIiwgImFyZWFfbWVhbiIsICJzbW9vdGhuZXNzX21lYW4iKV0pCmBgYApUaGUgYWJvdmUgc2hvd3MgdXMgdGhhdCB0aGUgZmVhdHVyZXMgYXJlIGFsbCBzY2FsZWQgZGlmZmVyZW50bHkgYW5kIG5lZWQgdG8KYmUgbm9ybWFsaXplZC4KCmBgYHtyIE5vcm1hbGl6YXRpb24sIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0UsIHBhZ2VkLnByaW50PUZBTFNFfQpub3JtYWxpemUgPC0gZnVuY3Rpb24oeCkgewpyZXR1cm4gKCh4IC0gbWluKHgpKSAvIChtYXgoeCkgLSBtaW4oeCkpKQp9CmBgYAoKYGBge3IgTm9ybWFsaXphdGlvbiAyfQp3YmNkX24gPC0gYXMuZGF0YS5mcmFtZShsYXBwbHkod2JjZFsyOjMxXSwgbm9ybWFsaXplKSkKc3VtbWFyeSh3YmNkX24kYXJlYV9tZWFuKQpgYGAKClRoZSBkYXRhIGhhcyBiZWVuIG5vcm1hbGl6ZWQuIE5vdyB3ZSBuZWVkIHRvIHNwbGl0IHRoZSBkYXRhIGludG8gdHJhaW5pbmcgYW5kIHRlc3Qgc2V0cy4KYGBge3IgU3BsaXR0aW5nIG91ciBkYXRhfQp3YmNkX3RyYWluIDwtIHdiY2RfblsxOjQ2OSwgXQp3YmNkX3Rlc3QgPC0gd2JjZF9uWzQ3MDo1NjksIF0Kd2JjZF90cmFpbl9sYWJlbHMgPC0gd2JjZFsxOjQ2OSwgMV0Kd2JjZF90ZXN0X2xhYmVscyA8LSB3YmNkWzQ3MDo1NjksIDFdCmBgYAoKIyMjIE1vZGVsIFRyYWluaW5nIApOb3cgd2UgY2FuIHRyYWluIHRoZSBtb2RlbCBmb3IgdGhlIGRhdGEgd2hpY2ggd2FzIGp1c3QgcHJlcGFyZWQKYGBge3J9CndiY2RfdGVzdF9wcmVkIDwtIGtubih0cmFpbiA9IHdiY2RfdHJhaW4sIHRlc3QgPSB3YmNkX3Rlc3QsCiAgICAgICAgICAgICAgICAgICAgICBjbCA9IHdiY2RfdHJhaW5fbGFiZWxzLCBrID0gMjEpCmBgYAoKIyMjIE1vZGVsIEV2YWx1YXRpb24KRmluYWxseSwgd2UgY2FuIHRha2UgYSBsb29rIGF0IHRoZSBtb2RlbCB3ZSBqdXN0IGNyZWF0ZWQgYW5kIGRldGVybWluZSBob3cgd2VsbAppdCBwZXJmb3JtZWQuCmBgYHtyfQpDcm9zc1RhYmxlKHggPSB3YmNkX3Rlc3RfbGFiZWxzLCB5ID0gd2JjZF90ZXN0X3ByZWQsCiAgICAgICAgICAgcHJvcC5jaGlzcSA9IEZBTFNFKQpgYGAKQnV0IGhvdyBkb2VzIGl0IGhvbGQgdXA/IExldCdzIHRyeSBhIGRpZmZlcmVudCBtZXRob2Qgb2Ygc3RhbmRhcmRpemF0aW9uIHRvIGZpbmQgb3V0LgoKIyMjIE1vZGVsIEltcHJvdmVtZW50Ck5vdyB3ZSBjYW4gYXBwbHkgJHokLXNjb3JlIHN0YW5kYXJkaXphdGlvbiB0byB0aGUgZGF0YSBhbmQgc2VlIGlmIHRoaXMgbWFrZXMgYSBkaWZmZXJlbmNlLgpgYGB7cn0Kd2JjZF96ID0gYXMuZGF0YS5mcmFtZShzY2FsZSh3YmNkWy0xXSkpCndiY2RfdHJhaW4gPC0gd2JjZF96WzE6NDY5LCBdCndiY2RfdGVzdCA8LSB3YmNkX3pbNDcwOjU2OSwgXQp3YmNkX3RyYWluX2xhYmVscyA8LSB3YmNkWzE6NDY5LCAxXQp3YmNkX3Rlc3RfbGFiZWxzIDwtIHdiY2RbNDcwOjU2OSwgMV0KYGBgCgpBZnRlciBzY2FsaW5nIHdlIGNhbiBwZXJmb3JtIHRoZSBzYW1lIG1ldGhvZHMgdGhhdCB3ZSBkaWQgaW4gdHJhaW5pbmcgYW5kIGV2YWx1YXRpb24uCmBgYHtyfQp3YmNkX3Rlc3RfcHJlZCA8LSBrbm4odHJhaW4gPSB3YmNkX3RyYWluLCB0ZXN0ID0gd2JjZF90ZXN0LAogICAgICAgICAgICAgICAgICAgICAgY2wgPSB3YmNkX3RyYWluX2xhYmVscywgayA9IDIxKQpDcm9zc1RhYmxlKHggPSB3YmNkX3Rlc3RfbGFiZWxzLCB5ID0gd2JjZF90ZXN0X3ByZWQsCiAgICAgICAgICAgcHJvcC5jaGlzcSA9IEZBTFNFKQpgYGAKTG9va3MgbGlrZSBvdXIgbmV3IG1vZGVsIHdhcyBhIGxpdHRsZSBsZXNzIHJlbGlhYmxlIHRoYW4gdGhlIGZpcnN0IG9uZS4gRnVydGhlciBldmFsdWF0aW9uCmFuZCBpbXByb3ZlbWVudCBpcyByZXF1aXJlZC4KCiMjIyBGdXJ0aGVyIEltcHJvdmVtZW50CldlIGNvdWxkIGdvIGJhY2sgdG8gdGhlIGZpcnN0IG1vZGVsIHRoYXQgd2UgY3JlYXRlZCBhbmQgbW9kaWZ5IHNvbWUgb2YgdGhlIGlucHV0cyB0byBzZWUgaWYgd2UgCmdldCBhIGJldHRlciBmaXQgd2l0aCBkaWZmZXJlbnQgKmsqIHZhbHVlcy4KCmBgYHtyfQp3YmNkX3RyYWluIDwtIHdiY2RfblsxOjQ2OSwgXQp3YmNkX3Rlc3QgPC0gd2JjZF9uWzQ3MDo1NjksIF0Kd2JjZF90cmFpbl9sYWJlbHMgPC0gd2JjZFsxOjQ2OSwgMV0Kd2JjZF90ZXN0X2xhYmVscyA8LSB3YmNkWzQ3MDo1NjksIDFdCmBgYAoKRmlyc3QsIGxldCdzIGF0dGVtcHQgdXNpbmcgYW4gZXZlbiBudW1iZXIgKFRoaXMgaXMgdXN1YWxseSBub3QgZG9uZSBpbiB0aGUgY2FzZSBvZiB0aWVzKQpgYGB7cn0Kd2JjZF90ZXN0X3ByZWQgPC0ga25uKHRyYWluID0gd2JjZF90cmFpbiwgdGVzdCA9IHdiY2RfdGVzdCwKICAgICAgICAgICAgICAgICAgICAgIGNsID0gd2JjZF90cmFpbl9sYWJlbHMsIGsgPSAyMikKQ3Jvc3NUYWJsZSh4ID0gd2JjZF90ZXN0X2xhYmVscywgeSA9IHdiY2RfdGVzdF9wcmVkLAogICAgICAgICAgIHByb3AuY2hpc3EgPSBGQUxTRSkKYGBgClRoZXNlIHJlc3VsdHMgYXJlIHN0aWxsIG5vdCBhcyBnb29kIGFzIHRoZSByZXN1bHRzIG9idGFpbmVkIHdpdGggdGhlIGZpcnN0IG1vZGVsLCBkZXNwaXRlCmhhdmluZyBhIGxhcmdlciB2YWx1ZSBvZiAqayouCgpOZXh0LCB3ZSBjYW4gdHJ5IHBpY2tpbmcgYSB2YWx1ZSBvZiAqayogdGhhdCBpcyAkXGZyYWN7XHNxcnR7Tn19ezJ9JCwgb3IgMTEuCmBgYHtyfQp3YmNkX3Rlc3RfcHJlZCA8LSBrbm4odHJhaW4gPSB3YmNkX3RyYWluLCB0ZXN0ID0gd2JjZF90ZXN0LAogICAgICAgICAgICAgICAgICAgICAgY2wgPSB3YmNkX3RyYWluX2xhYmVscywgayA9IDExKQpDcm9zc1RhYmxlKHggPSB3YmNkX3Rlc3RfbGFiZWxzLCB5ID0gd2JjZF90ZXN0X3ByZWQsCiAgICAgICAgICAgcHJvcC5jaGlzcSA9IEZBTFNFKQpgYGAKQW5kIHdlIG9idGFpbiB0aGUgc2FtZSB2YWx1ZSBhcyBmb3IgJGs9MjIkLiBGaW5hbGx5LCBsZXQncyBkb3VibGUgb3VyIGluaXRpYWwgKmsqIGFuZCBzdWJ0cmFjdCAxICg0MSkuCmBgYHtyfQp3YmNkX3Rlc3RfcHJlZCA8LSBrbm4odHJhaW4gPSB3YmNkX3RyYWluLCB0ZXN0ID0gd2JjZF90ZXN0LAogICAgICAgICAgICAgICAgICAgICAgY2wgPSB3YmNkX3RyYWluX2xhYmVscywgayA9IDQxKQpDcm9zc1RhYmxlKHggPSB3YmNkX3Rlc3RfbGFiZWxzLCB5ID0gd2JjZF90ZXN0X3ByZWQsCiAgICAgICAgICAgcHJvcC5jaGlzcSA9IEZBTFNFKQpgYGAKSXQgYXBwZWFycyB0aGF0IG91ciB2YWx1ZSBoYXMgZ29uZSBkb3duLiBUaGlzIGlzIGFuIGludGVyZXN0aW5nIGRpc2NvdmVyeSEKCiMjIEV2YWx1YXRpbmcgdGhlIElyaXMgZGF0YXNldApOb3cgd2UgY2FuIHRha2UgYSBsb29rIGF0IHRoZSBJcmlzW0BhbmRlcjM2XSBkYXRhc2V0IHByb3ZpZGllZCBieSB0aGUgVW5pdmVyc2l0eSBvZiBDYWxpZm9ybmlhIElydmluZSdzIG1hY2hpbmUgbGVhcm5pbmcgZGF0YXNldCByZXBvc2l0b3J5IFtAbGljaG0xM10uIFRoaXMgZGF0YXNldCBwcm92aWRlcyBhIHNpbXBsZSAxNTAgcG9pbnQgNSBmZWF0dXJlCmRhdGFzZXQgdGhhdCBjYW4gYmUgY2xhc3NpZmllZCBlYXNpbHkgdXNpbmcga05OLiBCZWxvdyB3ZSB3aWxsIGZvbGxvdyB0aGUgc3RlcHMgcHJvdmlkZWQgYnkgTGFudHouCgojIyMgRGF0YSBQcmVwYXJhdGlvbiBhbmQgRXhwbG9yYXRpb24gLSBJcmlzClRoZSBJcmlzIGRhdGFzZXQgaXMgYWxyZWFkeSBpbmNsdWRlZCBpbiBSLCBzbyBpdCB3aWxsIGJlIGVhc3kgdG8gcHJlcHJvY2Nlc3MuIFRoZSBmZWF0dXJlcyBpbmNsdWRlZCBpbiB0aGUgZGF0YSBzZXQgYXJlICpTZXBhbCBMZW5ndGgqLCAqU2VwYWwgV2lkdGgqLCAqUGV0YWwgTGVuZ3RoKiwgKlBldGFsIFdpZHRoKiwgYW5kICpTcGVjaWVzKi4KYGBge3IgSXJpcyBkYXRhc2V0fQpzdHIoaXJpcykKYGBgCgpgYGB7cn0KdGFibGUoaXJpcyRTcGVjaWVzKQoKcm91bmQocHJvcC50YWJsZSh0YWJsZShmYWN0b3IoaXJpcyRTcGVjaWVzLCBsZXZlbHMgPSBjKCJzZXRvc2EiLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAidmVyc2ljb2xvciIsIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICJ2aXJnaW5pY2EiKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbGFiZWxzID0gYygiU2V0b3NhIiwgIlZlcnNpY29sb3IiLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICJWaXJnaW5pY2EiKSkpKQogICAgICAqIDEwMCwgZGlnaXRzID0gMSkKYGBgCgpMb29raW5nIGF0IHRoZSBzdHJ1Y3R1cmUgb2YgdGhlIGRhdGFzZXQgdXNpbmcgYHN0cigpYCBzaG93cyB1cyB0aGF0IHdlIG5lZWQgdG8gbm9ybWFsaXplIHRoZSBkYXRhLgpUaGlzIGNhbiBiZSBkb25lIHdpdGggb3VyIG5vcm1hbGl6YXRpb24gZnVuY3Rpb24gY3JlYXRlZCBhYm92ZS4KYGBge3J9CmlyaXNfbiA8LSBhcy5kYXRhLmZyYW1lKGxhcHBseShpcmlzWzE6NF0sIG5vcm1hbGl6ZSkpCmlyaXNfbiA8LSBjYmluZChpcmlzX24sIGlyaXNbNV0pCmBgYAoKTmV4dCwgd2UgY2FuIHNwbGl0IHRoZSBkYXRhc2V0IGludG8gdHdvIHNldHMsIHRoZSB0ZXN0IHNldCBhbmQgdGhlIHRyYWluaW5nIHNldC4gRm9yIHRoaXMgZGF0YXNldCwgSSB1c2VkCnRoZSBSIGZ1bmN0aW9uIGBzYW1wbGUoKWAgdG8gcmFuZG9tbHkgc2VsZWN0IDMvNCBvZiB0aGUgZGF0YSB0byBiZSB1c2VkIGFzIGEgdHJhaW5pbmcgc2V0IGFuZCB0aGVuIHRvIHVzZQp0aGUgcmVtYWluaW5nIGRhdGEgYXMgYSB0ZXN0IHNldCB0byBkZXRlcm1pbmUgdGhlIHZhbGlkaXR5IG9mIHRoZSBtb2RlbC4KYGBge3IgU3BsaXR0aW5nIHRoZSBkYXRhc2V0IHJhbmRvbWx5fQpyYW5kb21JcmlzIDwtIHNhbXBsZShucm93KGlyaXMpLCAxMTIpCmlyaXNfdGVzdCA8LSBpcmlzX25bLXJhbmRvbUlyaXMsIF0KaXJpc190cmFpbiA8LSBpcmlzX25bcmFuZG9tSXJpcyxdCmBgYAoKTmV4dCB3ZSBjYW4gdXNlIHRoZSBgY2FyZXRgIHBhY2thZ2UgdG8gdXNlICprLWZvbGQgY3Jvc3MgdmFsaWRhdGlvbiouIFRoaXMgbWV0aG9kIGNhbiBnaXZlIHVzIHNvbWUgaW5zaWdodCBpbnRvIHdoYXQgdGhlIGJlc3QgdmFsdWUgZm9yICprKiBpbiBvdXIgKmtOTiogbW9kZWwgd2lsbCBiZS4KYGBge3J9CnNldC5zZWVkKDEyMykKY3RybCA8LSB0cmFpbkNvbnRyb2wobWV0aG9kID0gInJlcGVhdGVkY3YiLCByZXBlYXRzID0gNSkKaXJpc19wcmVkIDwtIHRyYWluKFNwZWNpZXMgfiAuLCBkYXRhID0gaXJpc190cmFpbiwgbWV0aG9kID0gImtubiIpCmlyaXNfcHJlZApgYGAKV2hhdCB0aGlzIHRlbGxzIHVzIGlzIHRoYXQgd2Ugc2hvdWxkIHVzZSB0aGUgdmFsdWUgJDkkIGZvciAqayouCgojIyMgVHJhaW5pbmcgb3VyIG1vZGVsIC0gaXJpcwpGaW5hbGx5LCB3ZSBjYW4gdHJhaW4gdGhlIG1vZGVsIHVzaW5nIG91ciB0cmFpbmluZyBkYXRhLgpgYGB7cn0KaXJpc190ZXN0X3ByZWQgPC0ga25uKHRyYWluID0gaXJpc190cmFpblsxOjRdLCB0ZXN0ID0gaXJpc190ZXN0WzE6NF0sCiAgICAgICAgICAgICAgICAgICAgICBjbCA9IGlyaXNfdHJhaW4kU3BlY2llcywgayA9IDkpCkNyb3NzVGFibGUoeCA9IGlyaXNfdGVzdCRTcGVjaWVzLCB5ID0gaXJpc190ZXN0X3ByZWQsCiAgICAgICAgICAgcHJvcC5jaGlzcSA9IEZBTFNFKQpgYGAKQWNjb3JkaW5nIHRvIHRoZSBjcm9zcyB0YWJsZSwgdGhlIHByZWRpY3Rpb25zIHdlIGdvdCB3ZXJlIGV4dHJlbWVseSBjbG9zZSBmb3Igc2V0b3NhLApidXQgbWFya2VkbHkgbGVzcyBzbyBmb3IgdmVyc2ljb2xvciBvciB2aXJnaW5pY2EuIFBlcmhhcHMgd2UgY291bGQgdXNlIHRoZSB0aHVtYnJ1bGUgb2YgCiRcc3FydHtOfSQgdG8gZXZhbHVhdGUgb3VyIG1vZGVsIGZ1cnRoZXIuCgojIyMgVHVuaW5nIG91ciBtb2RlbCAtIGlyaXMKJFxzcXJ0ezE1MH1cYXBwcm94IDEzJC4gTGV0J3MgYXR0ZW1wdCBvdXIgbW9kZWwgdXNpbmcgdGhlIG51bWJlciAxMyB0byBzZWUgd2hhdCByZXN1bHRzIHdlIG9idGFpbi4KYGBge3J9CmlyaXNfdGVzdF9wcmVkMiA8LSBrbm4odHJhaW4gPSBpcmlzX3RyYWluWzE6NF0sIHRlc3QgPSBpcmlzX3Rlc3RbMTo0XSwKICAgICAgICAgICAgICAgICAgICAgIGNsID0gaXJpc190cmFpbiRTcGVjaWVzLCBrID0gMTMpCkNyb3NzVGFibGUoeCA9IGlyaXNfdGVzdCRTcGVjaWVzLCB5ID0gaXJpc190ZXN0X3ByZWQyLAogICAgICAgICAgIHByb3AuY2hpc3EgPSBGQUxTRSkKYGBgCgpUaGUgb2J0YWluZWQgcmVzdWx0cyBhcmUgdGhlIHNhbWUgYXMgdGhvc2Ugb2J0YWluZWQgYmVmb3JlLgoKIyMjIFRoaXJkIGF0dGVtcHQgYXQgdHVuaW5nCkxldCdzIHRyeSAka1xhcHByb3hcZnJhY3tcc3FydHtOfX17Mn0kLCB3aGljaCBnaXZlcyB1cyAkNyQKYGBge3J9CmlyaXNfdGVzdF9wcmVkIDwtIGtubih0cmFpbiA9IGlyaXNfdHJhaW5bMTo0XSwgdGVzdCA9IGlyaXNfdGVzdFsxOjRdLAogICAgICAgICAgICAgICAgICAgICAgY2wgPSBpcmlzX3RyYWluJFNwZWNpZXMsIGsgPSA3KQpDcm9zc1RhYmxlKHggPSBpcmlzX3Rlc3QkU3BlY2llcywgeSA9IGlyaXNfdGVzdF9wcmVkLAogICAgICAgICAgIHByb3AuY2hpc3EgPSBGQUxTRSkKYGBgCgpUaGlzIG1vZGVsIGdpdmVzIHVzIHRoZSBiZXN0IHJlc3VsdHMgYnkgZmFyLiBTb21ldGltZXMgc2ltcGxpY2l0eSBpcyBiZXN0LgojIyBSZWZlcmVuY2Vz