Features of the Breast Cancer Dataset are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. ‘n’ the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets”, Optimization Methods and Software 1, 1992, 23-34].
The dataset can be downloaded at Kaggle.
Attribute Information:
Malignant is a term for diseases in which abnormal cells divide without control and can invade nearby tissues. Malignant cells can also spread to other parts of the body through the blood and lymph systems. It means the cell is cancerous.
Benign means not cancerous. Benign tumors may grow larger but do not spread to other parts of the body. They are also called nonmalignant.
Ten real-valued features are computed for each cell nucleus:
The mean, standard error and “worst” or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, column 3 is Mean of Radius, coulmn 13 is Radius Standard Error, column 23 is Worst Radius.
All feature values are recoded with four significant digits.
Missing attribute values: none
Class distribution: 357 benign, 212 malignant.
For our diagnoses, Malignant is represented as 1 while Benign represented as 0
## radius_mean texture_mean perimeter_mean
## 1.412729e+01 1.928965e+01 9.196903e+01
## area_mean smoothness_mean compactness_mean
## 6.548891e+02 9.636028e-02 1.043410e-01
## concavity_mean concave.points_mean symmetry_mean
## 8.879932e-02 4.891915e-02 1.811619e-01
## fractal_dimension_mean radius_se texture_se
## 6.279761e-02 4.051721e-01 1.216853e+00
## perimeter_se area_se smoothness_se
## 2.866059e+00 4.033708e+01 7.040979e-03
## compactness_se concavity_se concave.points_se
## 2.547814e-02 3.189372e-02 1.179614e-02
## symmetry_se fractal_dimension_se radius_worst
## 2.054230e-02 3.794904e-03 1.626919e+01
## texture_worst perimeter_worst area_worst
## 2.567722e+01 1.072612e+02 8.805831e+02
## smoothness_worst compactness_worst concavity_worst
## 1.323686e-01 2.542650e-01 2.721885e-01
## concave.points_worst symmetry_worst fractal_dimension_worst
## 1.146062e-01 2.900756e-01 8.394582e-02
## radius_mean texture_mean perimeter_mean
## 3.524049e+00 4.301036e+00 2.429898e+01
## area_mean smoothness_mean compactness_mean
## 3.519141e+02 1.406413e-02 5.281276e-02
## concavity_mean concave.points_mean symmetry_mean
## 7.971981e-02 3.880284e-02 2.741428e-02
## fractal_dimension_mean radius_se texture_se
## 7.060363e-03 2.773127e-01 5.516484e-01
## perimeter_se area_se smoothness_se
## 2.021855e+00 4.549101e+01 3.002518e-03
## compactness_se concavity_se concave.points_se
## 1.790818e-02 3.018606e-02 6.170285e-03
## symmetry_se fractal_dimension_se radius_worst
## 8.266372e-03 2.646071e-03 4.833242e+00
## texture_worst perimeter_worst area_worst
## 6.146258e+00 3.360254e+01 5.693570e+02
## smoothness_worst compactness_worst concavity_worst
## 2.283243e-02 1.573365e-01 2.086243e-01
## concave.points_worst symmetry_worst fractal_dimension_worst
## 6.573234e-02 6.186747e-02 1.806127e-02
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 3.6444 2.3857 1.67867 1.40735 1.28403 1.09880
## Proportion of Variance 0.4427 0.1897 0.09393 0.06602 0.05496 0.04025
## Cumulative Proportion 0.4427 0.6324 0.72636 0.79239 0.84734 0.88759
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.82172 0.69037 0.6457 0.59219 0.5421 0.51104
## Proportion of Variance 0.02251 0.01589 0.0139 0.01169 0.0098 0.00871
## Cumulative Proportion 0.91010 0.92598 0.9399 0.95157 0.9614 0.97007
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 0.49128 0.39624 0.30681 0.28260 0.24372 0.22939
## Proportion of Variance 0.00805 0.00523 0.00314 0.00266 0.00198 0.00175
## Cumulative Proportion 0.97812 0.98335 0.98649 0.98915 0.99113 0.99288
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 0.22244 0.17652 0.1731 0.16565 0.15602 0.1344
## Proportion of Variance 0.00165 0.00104 0.0010 0.00091 0.00081 0.0006
## Cumulative Proportion 0.99453 0.99557 0.9966 0.99749 0.99830 0.9989
## PC25 PC26 PC27 PC28 PC29 PC30
## Standard deviation 0.12442 0.09043 0.08307 0.03987 0.02736 0.01153
## Proportion of Variance 0.00052 0.00027 0.00023 0.00005 0.00002 0.00000
## Cumulative Proportion 0.99942 0.99969 0.99992 0.99997 1.00000 1.00000
## PC1 PC2 PC3
## radius_mean -0.21890244 0.233857132 -0.008531243
## texture_mean -0.10372458 0.059706088 0.064549903
## perimeter_mean -0.22753729 0.215181361 -0.009314220
## area_mean -0.22099499 0.231076711 0.028699526
## smoothness_mean -0.14258969 -0.186113023 -0.104291904
## compactness_mean -0.23928535 -0.151891610 -0.074091571
## concavity_mean -0.25840048 -0.060165363 0.002733838
## concave.points_mean -0.26085376 0.034767500 -0.025563541
## symmetry_mean -0.13816696 -0.190348770 -0.040239936
## fractal_dimension_mean -0.06436335 -0.366575471 -0.022574090
## radius_se -0.20597878 0.105552152 0.268481387
## texture_se -0.01742803 -0.089979682 0.374633665
## perimeter_se -0.21132592 0.089457234 0.266645367
## area_se -0.20286964 0.152292628 0.216006528
## smoothness_se -0.01453145 -0.204430453 0.308838979
## compactness_se -0.17039345 -0.232715896 0.154779718
## concavity_se -0.15358979 -0.197207283 0.176463743
## concave.points_se -0.18341740 -0.130321560 0.224657567
## symmetry_se -0.04249842 -0.183848000 0.288584292
## fractal_dimension_se -0.10256832 -0.280092027 0.211503764
## radius_worst -0.22799663 0.219866379 -0.047506990
## texture_worst -0.10446933 0.045467298 -0.042297823
## perimeter_worst -0.23663968 0.199878428 -0.048546508
## area_worst -0.22487053 0.219351858 -0.011902318
## smoothness_worst -0.12795256 -0.172304352 -0.259797613
## compactness_worst -0.21009588 -0.143593173 -0.236075625
## concavity_worst -0.22876753 -0.097964114 -0.173057335
## concave.points_worst -0.25088597 0.008257235 -0.170344076
## symmetry_worst -0.12290456 -0.141883349 -0.271312642
## fractal_dimension_worst -0.13178394 -0.275339469 -0.232791313
## diagnosis
## wisc.hclust.clusters 0 1
## 1 12 165
## 2 2 5
## 3 343 40
## 4 0 2
# Create a k-means model on wisc.data: wisc.km
wisc.km <- kmeans(scale(wisc.data), centers = 2, nstart = 20)
## diagnosis
## 0 1
## 1 14 175
## 2 343 37
##
## wisc.hclust.clusters 1 2
## 1 160 17
## 2 7 0
## 3 20 363
## 4 2 0
Our table suggests, clusters 1, 2, and 4 from the hierarchical clustering model can be interpreted as the cluster 1 equivalent from the k-means algorithm, and cluster 3 can be interpreted as the cluster 2 equivalent.
# Create a hierarchical clustering model: wisc.pr.hclust
wisc.pr.hclust <- hclust(dist(wisc.pr$x[, 1:7]), method = "complete")
# Cut model into 4 clusters: wisc.pr.hclust.clusters
wisc.pr.hclust.clusters <- cutree(wisc.pr.hclust, k = 4)
## wisc.pr.hclust.clusters
## diagnosis 1 2 3 4
## 0 5 350 2 0
## 1 113 97 0 2
## wisc.hclust.clusters
## diagnosis 1 2 3 4
## 0 12 2 343 0
## 1 165 5 40 2
##
## diagnosis 1 2
## 0 14 343
## 1 175 37