Predicting Breast Cancer Using a Deep Learning Neural Network

Introduction:

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets”, Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

ID number 2) Diagnosis (M = malignant, B = benign) 3-32)

Ten real-valued features are computed for each cell nucleus:

radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension (“coastline approximation” - 1)

The mean, standard error and “worst” or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant.

Breast Cancer Wisconsin (Diagnostic) Data Set

## Classes 'tbl_df', 'tbl' and 'data.frame':    569 obs. of  32 variables:
##  $ id                     : num  842302 842517 84300903 84348301 84358402 ...
##  $ diagnosis              : chr  "M" "M" "M" "M" ...
##  $ radius_mean            : num  18 20.6 19.7 11.4 20.3 ...
##  $ texture_mean           : num  10.4 17.8 21.2 20.4 14.3 ...
##  $ perimeter_mean         : num  122.8 132.9 130 77.6 135.1 ...
##  $ area_mean              : num  1001 1326 1203 386 1297 ...
##  $ smoothness_mean        : num  0.1184 0.0847 0.1096 0.1425 0.1003 ...
##  $ compactness_mean       : num  0.2776 0.0786 0.1599 0.2839 0.1328 ...
##  $ concavity_mean         : num  0.3001 0.0869 0.1974 0.2414 0.198 ...
##  $ concave points_mean    : num  0.1471 0.0702 0.1279 0.1052 0.1043 ...
##  $ symmetry_mean          : num  0.242 0.181 0.207 0.26 0.181 ...
##  $ fractal_dimension_mean : num  0.0787 0.0567 0.06 0.0974 0.0588 ...
##  $ radius_se              : num  1.095 0.543 0.746 0.496 0.757 ...
##  $ texture_se             : num  0.905 0.734 0.787 1.156 0.781 ...
##  $ perimeter_se           : num  8.59 3.4 4.58 3.44 5.44 ...
##  $ area_se                : num  153.4 74.1 94 27.2 94.4 ...
##  $ smoothness_se          : num  0.0064 0.00522 0.00615 0.00911 0.01149 ...
##  $ compactness_se         : num  0.049 0.0131 0.0401 0.0746 0.0246 ...
##  $ concavity_se           : num  0.0537 0.0186 0.0383 0.0566 0.0569 ...
##  $ concave points_se      : num  0.0159 0.0134 0.0206 0.0187 0.0188 ...
##  $ symmetry_se            : num  0.03 0.0139 0.0225 0.0596 0.0176 ...
##  $ fractal_dimension_se   : num  0.00619 0.00353 0.00457 0.00921 0.00511 ...
##  $ radius_worst           : num  25.4 25 23.6 14.9 22.5 ...
##  $ texture_worst          : num  17.3 23.4 25.5 26.5 16.7 ...
##  $ perimeter_worst        : num  184.6 158.8 152.5 98.9 152.2 ...
##  $ area_worst             : num  2019 1956 1709 568 1575 ...
##  $ smoothness_worst       : num  0.162 0.124 0.144 0.21 0.137 ...
##  $ compactness_worst      : num  0.666 0.187 0.424 0.866 0.205 ...
##  $ concavity_worst        : num  0.712 0.242 0.45 0.687 0.4 ...
##  $ concave points_worst   : num  0.265 0.186 0.243 0.258 0.163 ...
##  $ symmetry_worst         : num  0.46 0.275 0.361 0.664 0.236 ...
##  $ fractal_dimension_worst: num  0.1189 0.089 0.0876 0.173 0.0768 ...

Pre_Processing the Data Set:

Independent Training Set

##  num [1:455, 1:31] 9.17e-04 9.58e-04 4.14e-07 9.94e-02 9.59e-04 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:455] "7" "231" "404" "418" ...
##   ..$ : chr [1:31] "id" "radius_mean" "texture_mean" "perimeter_mean" ...

Independent Testing Set

##  num [1:114, 1:31] 0.000915 0.092564 0.000918 0.000919 0.009331 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:114] "1" "4" "9" "14" ...
##   ..$ : chr [1:31] "id" "radius_mean" "texture_mean" "perimeter_mean" ...

Dependent Training Set

##  num [1:455, 1:2] 0 0 1 0 1 1 1 1 1 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:455] "7" "231" "404" "418" ...
##   ..$ : chr [1:2] "1" "2"

Dependent Testing Set

##  num [1:114, 1:2] 0 0 0 0 1 1 0 1 1 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:114] "1" "4" "9" "14" ...
##   ..$ : chr [1:2] "1" "2"

Neural Network Model Architecture:

## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (None, 128)                   4096        
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 2)                     258         
## ===========================================================================
## Total params: 4,354
## Trainable params: 4,354
## Non-trainable params: 0
## ___________________________________________________________________________

Training the Neural Network Model:

## Trained on 364 samples, validated on 91 samples (batch_size=1, epochs=100)
## Final epoch (plot to see history):
## val_loss: 0.4327
##  val_acc: 0.9341
##     loss: 0.02001
##      acc: 0.9918

Predict Using the Testing Dataset:

	0	1
0	109	5
1	5	109

Evaluate the Model’s Performance:

Accuracy
95.61