This is perhaps the best known database to be found in the pattern recognition literature. Fisher’s paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. Source: https://archive.ics.uci.edu/ml/datasets/iris. The dataset comes installed with R.
ggplot2: https://cran.r-project.org/web/packages/ggplot2/index.html and ggally: https://cran.r-project.org/web/packages/GGally/index.html
The function ggpairs of the library ggally provides a summary of the dataset and the correlations between variables.
We can observe that either petal dimension create the maximum differentiation between Setosas and other species. Hence, Setosas could be classified using the Petal.Width or Petal.Length. The criterium could be: every datapoint with a Petal.Length lower than the maximum petal length of the setosas is a Setosa.
Similarly, every datapoint beyond the maximum Versicolor petal width and length is a Virginica. But there is an area that require further specifications. We will determine them visually.
The following parallel coordinates diagrams show step by step the remaining dataset of the classification process.
STEP 1: Setosas are easily classified, hende they are removed. Similarly, the Versicolor and Virginicas beyond the extreme values are also removed. The following diagramam shows the intersection of Versicolor and Virginica datapoints.
STEP 2: Versicolor datapoints above the maximum value were removed.
STEP 3: Virginicas with the maximum value of Petal.Width were removed.
STEP 4: At this point 139/150 (93%) datapoints are correctly classified. Further model fitting could be done doing a second pass on the Sepal.Width variable risking to overfit the model.
The variable Petal.Width has the highest classification potential followed by Sepal.Width. These visualizations inform the implementation of a classification tree that needs further evaluation of precission and accuracy.