Abstract

Early diagnosis of any disease plays a critical role in successful treatment of patients. Every year thousands of patients are diagnosed with breast cancer. Though large amounts of patient clinical data is collected and stored every year, but a small subset of predictive factors are used in determining outcomes.

In this project, we use a data mining approach to diagnose breast cancer. The data-driven approach used here can efficiently process clinical dataset to discover patterns and reveal hidden information for early detection and successfully treatment of breast cancer patients.

Introduction and Background

Breast cancer is one of the most severe cancers. It has taken hundreds of thousands of lives every year. Early prediction of breast cancer plays an important role in successful treatment and saving lives of thousands of patients every year. However, the conventional approaches are limited in providing such capability. The recent breakthrough of data analytics and data mining techniques have opened a new door for healthcare diagnostic and prediction.

Over the past decades medical records and clinical data have been collected and stored in electronic databases. Both the government and other public organizations have accelerated the technology toward transparency by making massively stored data usable, searchable, and actionable. Despite of the massive healthcare databases available, only small part of the data has been used by domain-experts for diagnostic and cure of diseases. This is because the massive healthcare data is too complex and voluminous to be effectively and efficiently processed and analyzed by the conventional methods.

Most breast cancers are detected by patients as a lump in the breast. These lumps can be benign or malignant. It is the physician’s responsibility to diagnose the cancer and determine whether it is benign or malignant.

There are different ways of diagnosing breast cancer. For example, mammography and surgical biopsy. Our main purpose in this project is to predict if the cells (tumor) are benign or malignant (cancer), based on dimensional characteristics of the nuclei cells. As secondary objectives we want to know if the quality of our prediction is similar to the scientific articles and if linear models are still competitive against nonlinear models, based on rules / trees.

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. FNA’s involves taking one or more samples of breast cells using a fine needle and syringe. They describe characteristics of the cell nuclei present in the image.

The mean, standard error, and “worst” or largest (mean of the three largest values) of these dimensional features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

Data Source

Data repository UC-Irvine, Breast Cancer Wisconsin (Diagnostic) Data Set from University of Wisconsin, Clinical Sciences Center.

Methodology

Data Exploration

Variable description

ID number
diagnosis (M = malignant, B = benign), this we changed for (1 = malignant, 0 = benign).

Ten real-valued features are computed for each cell nucleus:

radius (mean of distances from center to points on the perimeter)
texture (standard deviation of gray-scale values)
perimeter (the distance around a shape)
area
smoothness (local variation in radius lengths)
compactness (\((perimeter^2 / area) - 1.0\))
concavity (severity of concave portions of the contour)
concave points (number of concave portions of the contour)
symmetry
fractal dimension (coastline approximation - 1)

For each variable we have 3 different statistics: Mean, standard error and worst case. We have a total of 30 predictors.

The change diagnosis predictor from malignant for 1, benign for 0, help us to see relationship beetwen the response and the predictors, and was done in this phase.

Data Analysis

Below is the structure and the summary of the dataset.

Selected Stats
	vars	n	mean	sd	median	min	max	range	skew	kurtosis
diagnosis	1	569	0.37	0.48	0.00	0.00	1.00	1.00	0.53	-1.73
radius_mean	2	569	14.13	3.52	13.37	6.98	28.11	21.13	0.94	0.81
texture_mean	3	569	19.29	4.30	18.84	9.71	39.28	29.57	0.65	0.73
perimeter_mean	4	569	91.97	24.30	86.24	43.79	188.50	144.71	0.99	0.94
area_mean	5	569	654.89	351.91	551.10	143.50	2501.00	2357.50	1.64	3.59
smoothness_mean	6	569	0.10	0.01	0.10	0.05	0.16	0.11	0.45	0.82
compactness_mean	7	569	0.10	0.05	0.09	0.02	0.35	0.33	1.18	1.61
concavity_mean	8	569	0.09	0.08	0.06	0.00	0.43	0.43	1.39	1.95
concave.points_mean	9	569	0.05	0.04	0.03	0.00	0.20	0.20	1.17	1.03
symmetry_mean	10	569	0.18	0.03	0.18	0.11	0.30	0.20	0.72	1.25
fractal_dimension_mean	11	569	0.06	0.01	0.06	0.05	0.10	0.05	1.30	2.95
radius_se	12	569	0.41	0.28	0.32	0.11	2.87	2.76	3.07	17.45
texture_se	13	569	1.22	0.55	1.11	0.36	4.88	4.52	1.64	5.26
perimeter_se	14	569	2.87	2.02	2.29	0.76	21.98	21.22	3.43	21.12
area_se	15	569	40.34	45.49	24.53	6.80	542.20	535.40	5.42	48.59
smoothness_se	16	569	0.01	0.00	0.01	0.00	0.03	0.03	2.30	10.32
compactness_se	17	569	0.03	0.02	0.02	0.00	0.14	0.13	1.89	5.02
concavity_se	18	569	0.03	0.03	0.03	0.00	0.40	0.40	5.08	48.24
concave.points_se	19	569	0.01	0.01	0.01	0.00	0.05	0.05	1.44	5.04
symmetry_se	20	569	0.02	0.01	0.02	0.01	0.08	0.07	2.18	7.78
fractal_dimension_se	21	569	0.00	0.00	0.00	0.00	0.03	0.03	3.90	25.94
radius_worst	22	569	16.27	4.83	14.97	7.93	36.04	28.11	1.10	0.91
texture_worst	23	569	25.68	6.15	25.41	12.02	49.54	37.52	0.50	0.20
perimeter_worst	24	569	107.26	33.60	97.66	50.41	251.20	200.79	1.12	1.04
area_worst	25	569	880.58	569.36	686.50	185.20	4254.00	4068.80	1.85	4.32
smoothness_worst	26	569	0.13	0.02	0.13	0.07	0.22	0.15	0.41	0.49
compactness_worst	27	569	0.25	0.16	0.21	0.03	1.06	1.03	1.47	2.98
concavity_worst	28	569	0.27	0.21	0.23	0.00	1.25	1.25	1.14	1.57
concave.points_worst	29	569	0.11	0.07	0.10	0.00	0.29	0.29	0.49	-0.55
symmetry_worst	30	569	0.29	0.06	0.28	0.16	0.66	0.51	1.43	4.37
fractal_dimension_worst	31	569	0.08	0.02	0.08	0.06	0.21	0.15	1.65	5.16

Below are the inference of the summary.

diagonsis is the target variable.
myCol and id are index. It is not required in predictors.
There is no NA in all the predictors. Imputation is not required.
Each variable is in different scale.
Each cell nuclei has different properties like radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry_mean, fractal dimenstion. Dataset also has Mean, standard error and the worst measure of that particular cell.

As a next step we will remove the unwanted variables and analyze individual set of dimensions.

Data Visualization

Histogram

As a first visualization, we will plot the histogram of all the predictor variables.

Histograms

BOX-PLOT

Lets split the predictors according to its category

BOX-PLOT

Some inference from the chart

Nuclei mean of the perimeter and area is higher.
Standard Error of the Area is higher.
In worst nuclei scenario, area has extremly high values.

Correlation matrix

Now lets plot the correlation plots to understand more about the correlated predictors.

Below is the overall correlation matrix of all the predictors

Correlation list (only high correlation values, r>0.9)

Diagnosis Plots

Lets deepdive into response variable and see its distribution.

Summary of data explorations findings:

Most histograms present very asymmetric behavior with similar to exponential distribution.
Some predictor look like exponential distribuition as radius_se, perimeter_se, area_se, concavity_se and fractal_dimension_se.
There is no true outliers, the outliers at box-plot is due the kind of distribution.
There is no missing.
we identified 21 pairs of highly correlated predictors, r> 0.9, this was due to the choice of predictors that are associated, measures things related: radius, perimeter and area.
There are 14 predictors related with the response, Diagnosis, with r>=0.6, this is a good news.

Data Preparation

To solve the highly correlated variables, we will follow two types.

Predictors transformation and remove correlated variables.
PCA transformed correlated variables.

Predictor transformations and removal of correlated variables

In this method following steps will be performed:

Solve high correlations values, the follow predictors were removed: area_mean, radius_mean, area_worst, compactness_mean, perimeter_worst, compactness_se. concavity_worstand fractal_dimension_worst.
Verify the data behavior of power transformation for the follow predictors: radius_se, perimeter_se, area_se and fractal_dimension_se.

There are some predictor variables which are skwed. We can perform power transformations on those variables.

As we saw in the chart, figure 3, the power transformation, has improved the distribution.

Now we will remove the highly correlated variables and then split the data in two data set, Train data set with 70% and Test data set with 30% of all data.

PCA transformation

In this type, as the variables are highly correlated, we will transform the predictors using principal component analysis(PCA). PCA will provide the transformed variables.

Above summary shows that 10 PCA’s shows the 95% of variation. And 17 PCA’s shows 99.1% of variation in the dataset. We will select 17 PCA’s to show the variation in the data. It also has a clusters of around 7. Now lets see the clusters in detail.

MODELS

The analyzed models are divided into two families:

Linear models: Logistic regression, Probit regression

Nonlinear models: Cubist Model, Neural Network Model and SVM (Support Vector Machine)

Linear Models

Logistic Regression

Logistic Regression is used to estimate the probability of a binary response based on one ore more predictors. The model itself simply models probability of output in terms of input, and does not perform statistical classification (it is not a classifier), though it can be used to make a classifier, for instance by choosing a cutoff value and classifying inputs with probability greater than the cutoff as one class, below the cutoff as the other.

Now we will build different models and compare the results.

As a initial set of models, we will try logistic regression with different variations in it.

Logistic Regression Model(Logit) with all variables

As a first model, we will build logistic regression model which has all the predictors with correlated variables..

## 
## Call:
## glm(formula = trainfull_y ~ ., family = binomial, data = trainfull_x)
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -4.085e-04  -2.000e-08  -2.000e-08   2.000e-08   4.427e-04  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)
## (Intercept)             -4.125e+03  1.062e+06  -0.004    0.997
## radius_mean             -1.689e+03  1.323e+05  -0.013    0.990
## texture_mean             2.697e+01  8.738e+03   0.003    0.998
## perimeter_mean           2.606e+02  2.497e+04   0.010    0.992
## area_mean               -1.522e+00  7.836e+02  -0.002    0.998
## smoothness_mean         -2.641e+03  9.064e+05  -0.003    0.998
## compactness_mean        -1.526e+04  1.898e+06  -0.008    0.994
## concavity_mean          -1.625e+02  8.434e+05   0.000    1.000
## concave.points_mean      8.913e+03  2.021e+06   0.004    0.996
## symmetry_mean           -1.826e+03  2.701e+05  -0.007    0.995
## fractal_dimension_mean   2.986e+04  2.587e+06   0.012    0.991
## radius_se                3.444e+03  6.046e+05   0.006    0.995
## texture_se               4.161e+01  6.254e+04   0.001    0.999
## perimeter_se            -3.411e+02  3.773e+04  -0.009    0.993
## area_se                 -3.528e+00  5.808e+03  -0.001    1.000
## smoothness_se           -8.118e+04  7.394e+06  -0.011    0.991
## compactness_se           2.619e+04  1.849e+06   0.014    0.989
## concavity_se            -1.046e+04  7.599e+05  -0.014    0.989
## concave.points_se        3.982e+04  4.285e+06   0.009    0.993
## symmetry_se              7.354e+03  1.928e+06   0.004    0.997
## fractal_dimension_se    -2.080e+05  1.870e+07  -0.011    0.991
## radius_worst            -4.495e+01  8.104e+04  -0.001    1.000
## texture_worst            3.704e+00  6.233e+03   0.001    1.000
## perimeter_worst          1.242e+01  5.608e+03   0.002    0.998
## area_worst               1.688e+00  3.940e+02   0.004    0.997
## smoothness_worst         8.448e+03  1.482e+06   0.006    0.995
## compactness_worst       -2.650e+03  2.002e+05  -0.013    0.989
## concavity_worst          2.083e+03  2.053e+05   0.010    0.992
## concave.points_worst     1.531e+03  4.357e+05   0.004    0.997
## symmetry_worst           1.064e+03  3.589e+05   0.003    0.998
## fractal_dimension_worst  1.029e+04  8.558e+05   0.012    0.990
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5.2517e+02  on 398  degrees of freedom
## Residual deviance: 1.9353e-06  on 368  degrees of freedom
## AIC: 62
## 
## Number of Fisher Scoring iterations: 25

None of the predictors are signficant due to correlated variables. This model did not provide any results and did not converge.

Logistic Regression without correlated variables

## 
## Call:
## glm(formula = train_y ~ ., family = binomial, data = train_x)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5623  -0.0088  -0.0002   0.0000   3.4069  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)  
## (Intercept)             -46.7853    37.6301  -1.243    0.214  
## texture_mean              0.2764     0.4418   0.626    0.531  
## perimeter_mean           -0.6385     0.3766  -1.696    0.090 .
## smoothness_mean        -194.1969   163.4813  -1.188    0.235  
## concavity_mean           89.8349    63.9501   1.405    0.160  
## concave.points_mean      81.0294   138.0329   0.587    0.557  
## symmetry_mean           -29.7323    53.1203  -0.560    0.576  
## fractal_dimension_mean  114.7941   251.3729   0.457    0.648  
## radius_se                14.8761    16.7799   0.887    0.375  
## texture_se                0.5514     3.4229   0.161    0.872  
## perimeter_se              3.5570    15.2526   0.233    0.816  
## area_se                -121.8861    97.8252  -1.246    0.213  
## smoothness_se          -755.0038   857.5639  -0.880    0.379  
## concavity_se           -132.9106    93.5363  -1.421    0.155  
## concave.points_se       677.3859   533.8223   1.269    0.204  
## symmetry_se              -7.0271   254.5291  -0.028    0.978  
## fractal_dimension_se      1.8134     1.5961   1.136    0.256  
## radius_worst              4.0617     2.4981   1.626    0.104  
## texture_worst             0.1956     0.4214   0.464    0.643  
## smoothness_worst        189.4718   128.4210   1.475    0.140  
## compactness_worst       -12.2243    10.7555  -1.137    0.256  
## concave.points_worst     -4.1564    49.2015  -0.084    0.933  
## symmetry_worst           34.9436    40.2384   0.868    0.385  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 525.172  on 398  degrees of freedom
## Residual deviance:  35.976  on 376  degrees of freedom
## AIC: 81.976
## 
## Number of Fisher Scoring iterations: 11

This time the model did converge, however it seems many variables are not significant. We will perform backward stepwise selection of the model.

## 
## Call:
## glm(formula = train_y ~ perimeter_mean + concavity_mean + concavity_se + 
##     concave.points_se + fractal_dimension_se + radius_worst + 
##     texture_worst + smoothness_worst, family = binomial, data = train_x)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7767  -0.0259  -0.0016   0.0000   3.8179  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -62.0316    16.4291  -3.776 0.000160 ***
## perimeter_mean        -0.4903     0.1901  -2.580 0.009888 ** 
## concavity_mean        64.7415    22.2160   2.914 0.003566 ** 
## concavity_se         -76.3333    36.4414  -2.095 0.036199 *  
## concave.points_se    451.3646   183.1439   2.465 0.013719 *  
## fractal_dimension_se   1.7057     0.8748   1.950 0.051198 .  
## radius_worst           4.3085     1.2386   3.478 0.000504 ***
## texture_worst          0.3653     0.1013   3.604 0.000313 ***
## smoothness_worst      72.0289    33.1361   2.174 0.029726 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 525.17  on 398  degrees of freedom
## Residual deviance:  47.10  on 390  degrees of freedom
## AIC: 65.1
## 
## Number of Fisher Scoring iterations: 10

Now all the variables are statistically significant and AIC got reduced.

Following predictors were chosen: - perimeter_mean - concavity_mean - concavity_se - concave.points_se - fractal_dimension_se - radius_worst - texture_worst - smoothness_worst

Now lets try out other models.

Logistic regression on PCA variables

We can also perform logistic regression on PCA transformed variables. However, the problem is the model is not interpretable. One of the biggest advantage of Logistic regression is interpretability. By using PCA variables, we loose that advantage. But lets see how the model reacts to the PCA variables.

## 
## Call:
## glm(formula = train_y ~ . - PC11 - PC6 - PC8 - PC10 - PC5 - PC9 - 
##     PC13 - PC14 - PC12 - PC7, family = binomial, data = pca_wdbc$x[, 
##     c(1:14)] %>% data.frame())
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4551  -0.0467  -0.0028   0.0009   3.3087  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.3741     0.4019  -3.419 0.000629 ***
## PC1          -3.0425     0.5450  -5.583 2.36e-08 ***
## PC2           2.8103     0.5622   4.999 5.77e-07 ***
## PC3          -1.7361     0.4486  -3.870 0.000109 ***
## PC4          -2.5773     0.6424  -4.012 6.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 525.172  on 398  degrees of freedom
## Residual deviance:  57.999  on 394  degrees of freedom
## AIC: 67.999
## 
## Number of Fisher Scoring iterations: 9

PCA models are difficult to interpret. So this model does not provide any other information.

Probit Regression

In Probit Regression, the dependent variable can take only two values. The purpose of the model is to estimate the probability that an observation with a particular characteristics will fall into a specific one of the categories. A probit model is a popular specification for an ordinal or a binary response model.

Now lets perform logistic regression with probit link function. After backward stepwise eliminiation, below is the model we got.

## 
## Call:
## glm(formula = train_y ~ . - perimeter_se - texture_worst - concave.points_worst - 
##     symmetry_se - fractal_dimension_mean - symmetry_mean - compactness_worst - 
##     concavity_se - concave.points_se - concavity_mean - smoothness_se - 
##     texture_se - symmetry_worst, family = binomial(link = "probit"), 
##     data = train_x)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.819  -0.003   0.000   0.000   3.281  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -10.07635    5.98008  -1.685 0.091991 .  
## texture_mean           0.25963    0.06897   3.764 0.000167 ***
## perimeter_mean        -0.41491    0.10733  -3.866 0.000111 ***
## smoothness_mean      -81.77502   42.32285  -1.932 0.053338 .  
## concave.points_mean  131.35210   34.53616   3.803 0.000143 ***
## radius_se             13.05659    5.87379   2.223 0.026226 *  
## area_se              -84.30666   38.45030  -2.193 0.028335 *  
## fractal_dimension_se   0.66639    0.37989   1.754 0.079402 .  
## radius_worst           2.20845    0.62953   3.508 0.000451 ***
## smoothness_worst      54.55936   20.46253   2.666 0.007669 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 525.172  on 398  degrees of freedom
## Residual deviance:  45.703  on 389  degrees of freedom
## AIC: 65.703
## 
## Number of Fisher Scoring iterations: 12

Seems there are nine predictors which are significant. It has similar AIC score compared to logit model.

Linear Models Summary

Train set performance

Performance metrics train
	accuracy	auc
Logistic	0.980	0.976
Logistic PCA	0.980	0.974
Probit	0.975	0.969

Test set performance

Performance metrics test
	accuracy	auc
Logistic	0.976	0.975
Logistic PCA	0.100	0.916
Probit	0.971	0.967

Non-linear Models

Cubist Model

Cubist is a prediction-oriented regression model that combines the ideas in Quinlan (1992) and Quinlan (1993).
Although it initially creates a tree structure, it collapses each path through the tree into a rule. A regression model is fit for each rule based on the data subset defined by the rules. The set of rules are pruned or possibly combined. and the candidate variables for the linear regression models are the predictors that were used in the parts of the rule that were pruned away. This part of the algorithm is consistent with the “M5” or Model Tree approach.
Cubist generalizes this model to add boosting (when committees > 1) and instance based corrections (see predict.cubist()). The number of instances is set at prediction time by the user and is not needed for model building.

Predictor importance from cubist model

Average Neural Network Model

Average Neural Networks regression AVNNET. A Neural Network (NN) is a graph of computational units that receive inputs and transfer the result into an output that is passed on. The units are ordered into layers to connect the features of an input vector to the features of an output vector. With training, such as the Back-Propagation algorithm, neural networks can be designed and trained to model the underlying relationship in data.

## Model Averaged Neural Network 
## 
## 399 samples
##  22 predictor
##   2 classes: '0', '1' 
## 
## Pre-processing: centered (22), scaled (22) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 360, 360, 359, 359, 360, 359, ... 
## Resampling results across tuning parameters:
## 
##   decay  Accuracy   Kappa    
##   0.01   0.9709681  0.9375156
##   0.10   0.9750322  0.9460329
## 
## Tuning parameter 'size' was held constant at a value of 1
## Tuning
##  parameter 'bag' was held constant at a value of FALSE
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were size = 1, decay = 0.1 and bag
##  = FALSE.

SVM

Support Vector Machines (SVM) are a class of methods, developed originally for classification, that find support points that best separate classes. SVM for regression is called Support Vector Regression (SVM).

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 399 samples
##  22 predictor
##   2 classes: '0', '1' 
## 
## Pre-processing: centered (22), scaled (22) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 360, 360, 359, 359, 360, 359, ... 
## Resampling results across tuning parameters:
## 
##   C        Accuracy   Kappa    
##      0.25  0.9569418  0.9052807
##      0.50  0.9639931  0.9214983
##      1.00  0.9674675  0.9287819
##      2.00  0.9724681  0.9397658
##      4.00  0.9719431  0.9386826
##      8.00  0.9729431  0.9412281
##     16.00  0.9609515  0.9158910
##     32.00  0.9559619  0.9052550
##     64.00  0.9494362  0.8911838
##    128.00  0.9494362  0.8911838
##    256.00  0.9494362  0.8911838
##    512.00  0.9494362  0.8911838
##   1024.00  0.9494362  0.8911838
##   2048.00  0.9494362  0.8911838
## 
## Tuning parameter 'sigma' was held constant at a value of 0.04889095
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.04889095 and C = 8.

Linear Models Summary

Train set performance

Performance metrics train
	accuracy	auc
AvNeural	0.990	0.986
SVM	0.995	0.993
Cubist	1.000	1.000

Test set performance

Performance metrics test
	accuracy	auc
AvNeural	0.971	0.970
SVM	0.971	0.970
Cubist	0.976	0.975

Model Analysis and Diagnostics

Best Linear and Nonlinear Models

In the following section, we will describe the best model.

Logistic regression (reduced model without hightly correlated predictors)

Confusion Matrix Test set

## Confusion Matrix and Statistics
## 
##                test_y
## conv_13_logit_t   0   1
##               0 103   2
##               1   2  63
##                                           
##                Accuracy : 0.9765          
##                  95% CI : (0.9409, 0.9936)
##     No Information Rate : 0.6176          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9502          
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9692          
##             Specificity : 0.9810          
##          Pos Pred Value : 0.9692          
##          Neg Pred Value : 0.9810          
##              Prevalence : 0.3824          
##          Detection Rate : 0.3706          
##    Detection Prevalence : 0.3824          
##       Balanced Accuracy : 0.9751          
##                                           
##        'Positive' Class : 1               
##

Cubistic Regression

## Confusion Matrix and Statistics
## 
##                test_y
## conv_31_cubit_t   0   1
##               0 103   2
##               1   2  63
##                                           
##                Accuracy : 0.9765          
##                  95% CI : (0.9409, 0.9936)
##     No Information Rate : 0.6176          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9502          
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9692          
##             Specificity : 0.9810          
##          Pos Pred Value : 0.9692          
##          Neg Pred Value : 0.9810          
##              Prevalence : 0.3824          
##          Detection Rate : 0.3706          
##    Detection Prevalence : 0.3824          
##       Balanced Accuracy : 0.9751          
##                                           
##        'Positive' Class : 1               
##

Comparison with scientific paper

Simple Learning Algorithms for Training Support Vector Machines

Charles Campbell and Nello Cristianini Cover in depth the usage of the Kernel-Adatron Algorithm in their paper Simple Learning Algorithms for Training Support Vector Machines[5]. Their method was adapted from the Statistical Mechanics approach to learning[8]. The core concept of implementing this algorithm is to find hyperplanes which have optimal stability.

The authors of the paper tested different statistical models before applying the Kernel-Adaton support Vector Machine model. Non SVM models include CART, RBF, and Linear Discriminant Analysis(LDA). CART models are representative of a binary tree where each root node represents a single input variable (x) and is then split on that variable, the leaf nodes of the tree will contain the output variable (y) that will make the prediction[9]. Radial Basis Function(RBF) is a type of model that can be employed for methods that require Linear or nonlinear problems. These types of functions have been associated with a single layer and multilayer networks[10]. Linear Discriminant Analysis(LDA) use statistical properties of the data that are calculated by each class for input variables in a single variable. For multiple variables, the properties are calculated over the multivariate Gaussian[11]. Multi-Layer Neural Network(Back-Propagation) is a supervised learning technique that consists of at least three-layered nodes, where each node is a neuron with a nonlinear activation function.

To achieve the ideal model performance, SLATSVM[5] used 10 fold cross-validation and the whole dataset. The results for each model were the following CART 94.2%, RBF 95.9%, LDA 96.0%, Multi-Layer Neural Network(Back-Propagation) 96.6% with the optimal performance being 99.48%.

The optimal model was achieved using cross-validation 10 fold, Centering and scaling the variables. The model used was svmRadial applied using the caret package in R-3.3.1. The Data was split 70% training and 30% testing. The performance was 98.7% against the training set. While high in accuracy when the code was tested against the holdout dataset, the accuracy increased slightly to 98.8% percent.

The svmRadial model performed with high accuracy in the training and test set. Other parameters that were exceptional where Area Under the Curve(AUC) with a value of 99.3% and 97.0% in the test set. The AUC captures the efficiency of the model in this example 90% being an A grade.

Summary and Conclusions

From the analysis we can conclude that the cubist and the logistic models have similar performance. The logistic model seems to be the easiest to implement whereas nonlinear models like SVM have implementation complexity. In logistic models we can see the coefficient of all predictors. Although, the logistic model is more interpretable than the cubist model, the cubist model seems to be best model because the accuracy in the cubist model is the highest 0.976.

REFERENCES:
[1] Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
[2] https://stackoverflow.com/questions/7074246/show-correlations-as-an-ordered-list-not-as-a-large-matrix
[3] https://cran.r-project.org/web/packages/Cubist/Cubist.pdf
[4] Andrew I. Schein and Lyle H. Ungar. A-Optimality for Active Learning of Logistic Regression Classifiers. Department of Computer and Information Science Levine Hall.
[5] Charles Campbell and Nello Cristianini. Simple Learning Algorithms for Training Support Vector Machines. Dept. of Engineering Mathematics.
[6] O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995.
[7] Kuhn, Johnson, Applied Predictive Modeling, Springer [8] Watkin, T., Ran, A. & Biehl, M. (1993). The Statistical Mechanics of Learning a Rule, Rev. Mod. Phys. 65(2). [9]Brownlee, J., Ph.D. (2017, September 20). Classification And Regression Trees for Machine Learning. Retrieved May 20, 2018, from https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/ [10]Mark J L Orr. (1996). Introduction to Radial Basis Function Networks. [11]Brownlee, J., Ph.D. (2016, April 6).Linear Discriminant Analysis for Machine Learning. Retrieved May 20 2018, from https://machinelearningmastery.com/linear-discriminant-analysis-for-machine-learning/

R code

knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/', echo=FALSE, warning=FALSE, message=FALSE)
#load libraries
library("xlsx")
library("psych")
library("GGally")
library("ggplot2")
library("caret")
library("dplyr")
library("pROC")
library("car")
library("leaps")
library("sjstats")
library("PerformanceAnalytics")
library("factoextra")
library("BGLR")
library("keras")
library("dplyr")
library("kableExtra")
library("Cubist")
#load dataset

url<-"https://raw.githubusercontent.com/bvshyam/Cancer_prediction/master/data/data_with groups.xlsx"

temp.file <- paste(tempfile(),".xlsx",sep = "")
download.file(url,temp.file, mode="wb")
WDBCdata<<-xlsx::read.xlsx(temp.file, sheetName = "ics data", header=T)



#WDBCdata<- read.xlsx("data/data_with groups.xlsx", sheetName = "ics data")

# First step the id will be removed to avoid any future trouble
# change M = malignant for 1 and B for B = benign for zero
WDBCdata$mycol<-NULL
WDBCdata$id<-NULL
WDBCdata$diagnosis<-ifelse(WDBCdata$diagnosis == "B", 0, 1)

# summary stats
WDBC_tbl<-describe(WDBCdata,IQR=T)[,c(1:5,8:10,11,12)]
kable(round(WDBC_tbl,2), caption = "Selected Stats", format = "html") %>% 
        kable_styling(latex_options = "striped", font_size=10)
rm(WDBC_tbl)
library("Hmisc")
hist.data.frame(WDBCdata, n.unique=1, mtitl = "Breast Cancer Histogram")
WDBCdata_mean = cbind(diagnosis=WDBCdata[,c(1)], WDBCdata[,c(2:11)])
WDBCdata_se = cbind(diagnosis=WDBCdata[,c(1)], WDBCdata[,c(12:21)])
WDBCdata_worst = cbind(diagnosis=WDBCdata[,c(1)], WDBCdata[,c(22:31)])

# boxplot
par(cex.axis=0.8) # is for x-axis
boxplot(WDBCdata_mean, las=2, col="green", main="Breast Cancer Box-Plot for Mean", ylim = c(0,150))
boxplot(WDBCdata_se, las=2, col="green", main="Breast Cancer Box-Plot for SE", ylim = c(0,150))
boxplot(WDBCdata_worst, las=2, col="green", main="Breast Cancer Box-Plot for Worst", ylim = c(0,150))

#Mean
chart.Correlation(WDBCdata_mean,histogram=TRUE,pch=19)

# SE
chart.Correlation(WDBCdata_se,histogram=TRUE,pch=19)
# Worst
chart.Correlation(WDBCdata_worst,histogram=TRUE,pch=19)
ggcorr(WDBCdata, nbreaks=8, palette='PRGn', label=TRUE, label_size=2, size = 1.8, label_color='black') + ggtitle("Breast Cancer Correlation Matrix") + theme(plot.title = element_text(hjust = 0.5, color = "grey15"))
z = cor(WDBCdata)

z = round(z,4)
z[abs(z)<0.9]=NA # remove low relationship
z[lower.tri(z,diag=TRUE)]=NA  #Prepare to drop duplicates and meaningless information
z=as.data.frame(as.table(z))  #Turn into a 3-column table
z=na.omit(z)  #Get rid of the junk we flagged above
z=z[order(-abs(z$Freq)),]    #Sort by highest correlation (whether +ve or -ve)
z
rm(z)


#qplot(as.factor(WDBCdata$diagnosis))+geom_bar() + #labs(x='Diagnosis', y ='Count')  

barplot(table(WDBCdata$diagnosis), col="blue", ylab="count", xlab="Diagnosis", main="Response distribution")



bc1<-BoxCoxTrans(WDBCdata$radius_se)
bc2<-BoxCoxTrans(WDBCdata$perimeter_se)
bc3<-BoxCoxTrans(WDBCdata$area_se )
bc5<-BoxCoxTrans(WDBCdata$fractal_dimension_se)

par(mfrow=c(1,2))
hist(WDBCdata$radius_se, main="Histogram radius_se", xlab="", col="yellow")
hist(WDBCdata$radius_se^bc1$lambda, main="Histogram radius_se transf.", xlab="", col="green")
par(mfrow=c(1,2))
hist(WDBCdata$perimeter_se, main="Histogram perimeter_se", xlab="", col="yellow")
hist(WDBCdata$perimeter_se^bc2$lambda, main="Histogram perimeter_se transf.", xlab="",col="green")
par(mfrow=c(1,2))
hist(WDBCdata$area_se, main="Histogram area_se",xlab="", col="yellow")
hist(WDBCdata$area_se^bc3$lambda, main="Histogram area_se transf.",xlab="",col="green")
par(mfrow=c(1,2))
hist(WDBCdata$fractal_dimension_se, main="Histogram dimension_se",xlab="", col="yellow")
hist(WDBCdata$fractal_dimension_se^bc5$lambda, main="Histogram dimension_se transf.",xlab="",col="green")
par(mfrow=c(1,1))


WDBCdatafull<-WDBCdata #copy data set

WDBCdata<-subset(WDBCdata, select=-c(area_mean,radius_mean,area_worst,compactness_mean,perimeter_worst,compactness_se,concavity_worst,fractal_dimension_worst)) # remove predictors


# WDBCdatafull all predictors
# WDBCdata remove highly correlated predictors


set.seed(123)
indx<-createDataPartition(WDBCdata$diagnosis, p=0.7, list=FALSE)

train_x<-WDBCdata[indx,-1]
train_y<-WDBCdata$diagnosis[indx] 

test_x<-WDBCdata[-indx,-1]
test_y<-WDBCdata$diagnosis[-indx]


# the subset bellow is with full predictors, cannot be used with logistic model, only can be used with model that accept highly correlated predictors.

trainfull_x<-WDBCdatafull[indx,-1]
trainfull_y<-WDBCdatafull$diagnosis[indx]

testfull_x<-WDBCdatafull[-indx,-1]
testfull_y<-WDBCdatafull$diagnosis[-indx]

# applying the power transformation for the test and train data set

train_x$radius_se<-train_x$radius_se^bc1$lambda
train_x$perimeter_se<-train_x$perimeter_se^bc2$lambda    
train_x$area_se<-train_x$area_se^bc3$lambda
train_x$fractal_dimension_se<-train_x$fractal_dimension_se^bc5$lambda

test_x$radius_se<-test_x$radius_se^bc1$lambda
test_x$perimeter_se<-test_x$perimeter_se^bc2$lambda    
test_x$area_se<-test_x$area_se^bc3$lambda
test_x$fractal_dimension_se<-test_x$fractal_dimension_se^bc5$lambda


pca_wdbc <- prcomp(train_x[,2:ncol(train_x)],center = TRUE, scale=TRUE)
pca_wbdc_test<-prcomp(test_x[,2:ncol(test_x)],center = TRUE, scale=TRUE)

plot(pca_wdbc, type='l', main="PCA - Principal Components Analysis Chart", col="red")

#summary(pca_wdbc$x)


pca_wdbc_var <- get_pca_var(pca_wdbc)



res <- kmeans(pca_wdbc_var$coord,centers = 5, nstart=25)
grp <- as.factor(res$cluster)

fviz_pca_var(pca_wdbc, col.var=grp, palette='jco', legend.title='Cluster')

model_11_logit_full <-glm(trainfull_y ~ . ,family=binomial, trainfull_x)
summary(model_11_logit_full)
model_12_logit_corr <-glm(train_y~.,family=binomial,data=train_x)
summary(model_12_logit_corr)

model_13_logit_corr_final <- update(model_12_logit_corr, .~.-symmetry_se-concave.points_worst-texture_se-perimeter_se-radius_se-symmetry_mean-fractal_dimension_mean-concave.points_mean-texture_mean-compactness_worst-symmetry_worst-smoothness_mean-area_se-smoothness_se )
summary(model_13_logit_corr_final)

model_14_pca <-glm( train_y~.-PC11 -PC6 -PC8 -PC10 -PC5 -PC9 -PC13 -PC14 -PC12 -PC7,family=binomial,data=pca_wdbc$x[,c(1:14)] %>% data.frame())
summary(model_14_pca)

model_15_probit <-glm( train_y~.-perimeter_se-texture_worst-concave.points_worst-symmetry_se-fractal_dimension_mean-symmetry_mean-compactness_worst-concavity_se-concave.points_se-concavity_mean-smoothness_se-texture_se-symmetry_worst,family=binomial(link = 'probit'),data=train_x)
summary(model_15_probit)
#Convert to 0/1
conv_13_logit_corr <- ifelse(predict(model_13_logit_corr_final) > 0.5,1,0)
#conf_13_logit_corr <- confusionMatrix(conv_13_logit_corr, train_y, positive="1")
conf_13_logit_corr <- confusionMatrix(table(conv_13_logit_corr, train_y), positive = "1")

conv_14_pca <- ifelse(predict(model_14_pca) > 0.5,1,0)
#conf_14_pca <- confusionMatrix(conv_14_pca, train_y, positive="1")
conf_14_pca <- confusionMatrix(table(conv_14_pca, train_y), positive="1")

conv_15_probit <- ifelse(predict(model_15_probit ) > 0.5,1,0)
#conf_15_probit <- confusionMatrix(conv_15_probit,train_y,positive="1")
conf_15_probit <- confusionMatrix(table(conv_15_probit, train_y), positive="1")

# compute accuracy

acc13<-conf_13_logit_corr$overall["Accuracy"]
acc14<-conf_14_pca$overall["Accuracy"]
acc15<-conf_15_probit$overall["Accuracy"]

# compute AUC

auc13<-roc(train_y ~ conv_13_logit_corr, train_x)$auc
auc14<-roc(train_y ~ conv_14_pca, train_x)$auc
auc15<-roc(train_y ~ conv_15_probit, train_x)$auc

df<-data.frame(accuracy=c(acc13,acc14,acc15),auc=c(auc13,auc14,auc15))
row.names(df)<-c("Logistic","Logistic PCA","Probit")


kable(round(df,3), caption = "Performance metrics train")        

#Convert to 0/1
conv_13_logit_t <- ifelse(predict(object=model_13_logit_corr_final, newdata=test_x, type="response") > 0.5,1,0)
#conf_13_logit_t <- confusionMatrix(conv_13_logit_t, test_y, positive="1")
conf_13_logit_t <- confusionMatrix(table(conv_13_logit_t, test_y), positive="1")

conv_14_pca_t <- ifelse(predict(model_14_pca, newdata=as.data.frame(pca_wbdc_test$x[,c(1:14)]), type="response") > 0.5,1,0)
#conf_14_pca_t <- confusionMatrix(conv_14_pca_t, test_y, positive="1")
conf_14_pca_t <- confusionMatrix(table(conv_14_pca_t, test_y), positive="1")

conv_15_probit_t <- ifelse(predict(model_15_probit, newdata=test_x, type="response") > 0.5,1,0)
#conf_15_probit_t <- confusionMatrix(conv_15_probit_t, test_y, positive="1")
conf_15_probit_t <- confusionMatrix(table(conv_15_probit_t, test_y), positive="1")

# compute accuracy

acc13_t<-conf_13_logit_t$overall["Accuracy"]
acc14_t<-conf_14_pca_t$overall["Accuracy"]
acc15_t<-conf_15_probit_t$overall["Accuracy"]

# compute AUC

auc13_t<-roc(test_y ~ conv_13_logit_t, test_x)$auc
auc14_t<-roc(test_y ~ conv_14_pca_t, test_x)$auc
auc15_t<-roc(test_y ~ conv_15_probit_t, test_x)$auc

df<-data.frame(accuracy=c(acc13_t,acc14_t,acc15_t),auc=c(auc13_t,auc14_t,auc15_t))
row.names(df)<-c("Logistic","Logistic PCA","Probit")

kable(round(df,3), caption = "Performance metrics test")   



set.seed(123)
ctrl=(trainControl(method="repeatedcv", repeats=5))

c<-c(100)
n<-c(3,4)

model_31_cubit<-train(train_x,train_y, method="cubist",
                preProcess = c("center", "scale"),
                tuneGrid = expand.grid(committees=c,neighbors=n),
                trControl = ctrl)

dotPlot(varImp(model_31_cubit), main="Cubist Predictor importance")

set.seed(123)

ctrl=(trainControl(method="repeatedcv", repeats=5))
nnetGrid<-expand.grid(.decay=c(0.01,0.1), .size=1, .bag=FALSE)

model_41_neural<-train(x=train_x,y=as.factor(train_y),method="avNNet",
                 tuneGrid = nnetGrid,
                 trControl = ctrl,
                 preProcess = c("center","scale"),
                 linout=TRUE,
                 trace=FALSE,
                 MaxNWts=10*(ncol(train_x)+1)+10+1,
                 maxit=500)

model_41_neural
set.seed(123)
ctrl=(trainControl(method="repeatedcv", repeats=5))

SVM_Radial_Fit = train(x=train_x,y=as.factor(train_y), method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 14,trControl = ctrl)

SVM_Radial_Fit


set.seed(123)
Chris_SVM_Pred = data.frame(Pred= predict(SVM_Radial_Fit, newdata = test_x))


plot(SVM_Radial_Fit, scales = list(x = list(log = 2)))

#Chris_Con =confusionMatrix(Chris_SVM_Pred$Pred,test_y,dnn = c("Prediction", "Reference"))

#Chris_SVM_acc = data.frame( Val =postResample(pred = Chris_SVM_Pred$Pred, obs = test_y))

#fourfoldplot(Chris_Con$table,main=paste("SVM (",round(Chris_SVM_acc[1,]*100),"%)",sep=""),color = c("#ed3b3b", "#0099ff"))

#Chris_Con
#Convert to 0/1
conv_41_neural <- predict(model_41_neural) 
#conf_41_neural<-confusionMatrix(conv_41_neural, train_y, positive="1")
conf_41_neural<-confusionMatrix(table(conv_41_neural, train_y), positive="1")

conv_21_svm <- predict(SVM_Radial_Fit)
#conf_21_svm<-confusionMatrix(conv_21_svm, train_y, positive="1")
conf_21_svm<-confusionMatrix(table(conv_21_svm, train_y), positive="1")

conv_31_cubit <- ifelse(predict(model_31_cubit) > 0.5,1,0)
#conf_31_cubit<-confusionMatrix(conv_31_cubit, train_y, positive="1")
conf_31_cubit<-confusionMatrix(table(conv_31_cubit, train_y), positive="1")

# compute accuracy

acc41<-conf_41_neural$overall["Accuracy"]
acc21<-conf_21_svm$overall["Accuracy"]
acc31<-conf_31_cubit$overall["Accuracy"]

# compute AUC

auc41<-roc(train_y ~ as.numeric(as.character(conv_41_neural)), train_x)$auc
auc21<-roc(train_y ~ as.numeric(as.character(conv_21_svm)), train_x)$auc
auc31<-roc(train_y ~ conv_31_cubit, train_x)$auc

df1<-data.frame(accuracy=c(acc41,acc21,acc31),auc=c(auc41,auc21,auc31))
row.names(df1)<-c("AvNeural","SVM","Cubist")

kable(round(df1,3), caption = "Performance metrics train")        

#Convert to 0/1
conv_41_neural_t <- predict(model_41_neural, newdata=test_x, type="raw") 
#conf_41_neural_t<-confusionMatrix(conv_41_neural_t,test_y,positive="1")
conf_41_neural_t<-confusionMatrix(table(conv_41_neural_t, test_y), positive="1")

conv_21_svm_t <- predict(SVM_Radial_Fit,newdata=test_x, type="raw")
#conf_21_svm_t<-confusionMatrix(conv_21_svm_t,test_y,positive="1")
conf_21_svm_t<-confusionMatrix(table(conv_21_svm_t, test_y), positive="1")

conv_31_cubit_t <- ifelse(predict(model_31_cubit,newdata=test_x) > 0.5,1,0)
#conf_31_cubit_t<-confusionMatrix(conv_31_cubit_t,test_y,positive="1")
conf_31_cubit_t<-confusionMatrix(table(conv_31_cubit_t, test_y), positive="1")

# compute accuracy

acc41_t<-conf_41_neural_t$overall["Accuracy"]
acc21_t<-conf_21_svm_t$overall["Accuracy"]
acc31_t<-conf_31_cubit_t$overall["Accuracy"]

# compute AUC

auc41_t<-roc(test_y ~ as.numeric(as.character(conv_41_neural_t)), test_x)$auc
auc21_t<-roc(test_y ~ as.numeric(as.character(conv_21_svm_t)), test_x)$auc
auc31_t<-roc(test_y ~ conv_31_cubit_t, test_x)$auc

df2<-data.frame(accuracy=c(acc41_t,acc21_t,acc31_t),auc=c(auc41_t,auc21_t,auc31_t))
row.names(df2)<-c("AvNeural","SVM","Cubist")

kable(round(df2,3), caption = "Performance metrics test")        


# Function for printing confusion matrix
confusion_analysis <- function(df,model){
  # Threshold value is 0.5, positive class is 1
  
  predicted = if_else(predict(model,df)>=0.5, 1,0)
  confusionMatrix(data = predicted,
                reference = df$Diagnosis,
                positive = "1")
}

# Function for calculating evaluation metrics
summary_analysis <- function(df,model){
  print(summary(model))
  print(paste0("BIC: ",BIC(model)))
  print(paste0("VIF: ",vif(model)))

  n = length(df$target)
  print(paste0("Naglekerke-pseudo-R2:",(1-exp((model$dev-model$null)/n))/(1-exp(-model$null/n))))
  print("Confusion Matrix:")
  confusion_analysis(df,model)
}



conf_13_logit_t

fourfoldplot(conf_13_logit_t$table,color = c("#ed3b3b", "#0099ff"))
RocCurve <- roc(response = test_y,
                ifelse(predict(object=model_13_logit_corr_final, newdata=test_x, type="response") > 0.5,1,0),levels = c(1,0))
plot(RocCurve, ylab = "Sensitivity", xlab = "1 - Specificity", main = "ROC Curve - Logistic", col = "red")
conf_31_cubit_t
fourfoldplot(conf_31_cubit_t$table,color = c("#ed3b3b", "#0099ff"))
rocCurve <- roc(response = test_y, predictor = predict(model_31_cubit,newdata = test_x), levels = c(1,0))
plot(rocCurve, ylab = "Sensitivity", xlab = "1 - Specificity", main = "ROC Curve - Cubist", col = "red")

Predicting the breast cancer by characteristics of the cell nuclei present in the image

Marco Siqueira Campos, Shyam BV, Christopher Estevez, Ahmed Sajjad

May 20, 2018

Abstract

Introduction and Background

Data Source

Methodology

Data Exploration

Variable description

Data Analysis

Data Visualization

Histogram

BOX-PLOT

Correlation matrix

Diagnosis Plots

Data Preparation

Predictor transformations and removal of correlated variables

PCA transformation

MODELS

Linear Models

Logistic Regression

Logistic Regression without correlated variables

Logistic regression on PCA variables

Probit Regression

Linear Models Summary

Non-linear Models

Cubist Model

Average Neural Network Model

SVM

Linear Models Summary

Model Analysis and Diagnostics

Best Linear and Nonlinear Models

Logistic regression (reduced model without hightly correlated predictors)

Cubistic Regression

Comparison with scientific paper

Simple Learning Algorithms for Training Support Vector Machines

Summary and Conclusions