Breast cancer (BC) is one of the most common cancers among women worldwide, representing the majority of new cancer cases and cancer-related deaths according to global statistics, making it a significant public health problem in today’s society. Classification and data mining methods are an effective way to classify data. Especially in medical field, where those methods are widely used in diagnosis and analysis to make decisions.
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at [Web Link]
Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, “Decision Tree Construction Via Linear Programming.” Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes.
The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets”, Optimization Methods and Software 1, 1992, 23-34].
This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/
Ten real-valued features are computed for each cell nucleus: 1.radius_mean:radius (mean of distances from center to points on the perimeter)
This analysis aims to observe which features are most helpful in predicting malignant or benign cancer and to see general trends that may aid us in model selection and hyper parameter selection. The goal is to classify whether the breast cancer is benign or malignant. To achieve this i have used machine learning classification methods to fit a function that can predict the discrete class of new input.
The first step is to visually inspect the data set. #DATA EXPLORATION #Load dataset
data <- read.csv("data.csv")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
View(head(data))
glimpse(data)
## Observations: 569
## Variables: 33
## $ id <int> 842302, 842517, 84300903, 84348301, 84...
## $ diagnosis <fct> M, M, M, M, M, M, M, M, M, M, M, M, M,...
## $ radius_mean <dbl> 17.990, 20.570, 19.690, 11.420, 20.290...
## $ texture_mean <dbl> 10.38, 17.77, 21.25, 20.38, 14.34, 15....
## $ perimeter_mean <dbl> 122.80, 132.90, 130.00, 77.58, 135.10,...
## $ area_mean <dbl> 1001.0, 1326.0, 1203.0, 386.1, 1297.0,...
## $ smoothness_mean <dbl> 0.11840, 0.08474, 0.10960, 0.14250, 0....
## $ compactness_mean <dbl> 0.27760, 0.07864, 0.15990, 0.28390, 0....
## $ concavity_mean <dbl> 0.30010, 0.08690, 0.19740, 0.24140, 0....
## $ concave.points_mean <dbl> 0.14710, 0.07017, 0.12790, 0.10520, 0....
## $ symmetry_mean <dbl> 0.2419, 0.1812, 0.2069, 0.2597, 0.1809...
## $ fractal_dimension_mean <dbl> 0.07871, 0.05667, 0.05999, 0.09744, 0....
## $ radius_se <dbl> 1.0950, 0.5435, 0.7456, 0.4956, 0.7572...
## $ texture_se <dbl> 0.9053, 0.7339, 0.7869, 1.1560, 0.7813...
## $ perimeter_se <dbl> 8.589, 3.398, 4.585, 3.445, 5.438, 2.2...
## $ area_se <dbl> 153.40, 74.08, 94.03, 27.23, 94.44, 27...
## $ smoothness_se <dbl> 0.006399, 0.005225, 0.006150, 0.009110...
## $ compactness_se <dbl> 0.049040, 0.013080, 0.040060, 0.074580...
## $ concavity_se <dbl> 0.05373, 0.01860, 0.03832, 0.05661, 0....
## $ concave.points_se <dbl> 0.015870, 0.013400, 0.020580, 0.018670...
## $ symmetry_se <dbl> 0.03003, 0.01389, 0.02250, 0.05963, 0....
## $ fractal_dimension_se <dbl> 0.006193, 0.003532, 0.004571, 0.009208...
## $ radius_worst <dbl> 25.38, 24.99, 23.57, 14.91, 22.54, 15....
## $ texture_worst <dbl> 17.33, 23.41, 25.53, 26.50, 16.67, 23....
## $ perimeter_worst <dbl> 184.60, 158.80, 152.50, 98.87, 152.20,...
## $ area_worst <dbl> 2019.0, 1956.0, 1709.0, 567.7, 1575.0,...
## $ smoothness_worst <dbl> 0.1622, 0.1238, 0.1444, 0.2098, 0.1374...
## $ compactness_worst <dbl> 0.6656, 0.1866, 0.4245, 0.8663, 0.2050...
## $ concavity_worst <dbl> 0.71190, 0.24160, 0.45040, 0.68690, 0....
## $ concave.points_worst <dbl> 0.26540, 0.18600, 0.24300, 0.25750, 0....
## $ symmetry_worst <dbl> 0.4601, 0.2750, 0.3613, 0.6638, 0.2364...
## $ fractal_dimension_worst <dbl> 0.11890, 0.08902, 0.08758, 0.17300, 0....
## $ X <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
#structure of the dataset
str(data)
## 'data.frame': 569 obs. of 33 variables:
## $ id : int 842302 842517 84300903 84348301 84358402 843786 844359 84458202 844981 84501001 ...
## $ diagnosis : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ radius_mean : num 18 20.6 19.7 11.4 20.3 ...
## $ texture_mean : num 10.4 17.8 21.2 20.4 14.3 ...
## $ perimeter_mean : num 122.8 132.9 130 77.6 135.1 ...
## $ area_mean : num 1001 1326 1203 386 1297 ...
## $ smoothness_mean : num 0.1184 0.0847 0.1096 0.1425 0.1003 ...
## $ compactness_mean : num 0.2776 0.0786 0.1599 0.2839 0.1328 ...
## $ concavity_mean : num 0.3001 0.0869 0.1974 0.2414 0.198 ...
## $ concave.points_mean : num 0.1471 0.0702 0.1279 0.1052 0.1043 ...
## $ symmetry_mean : num 0.242 0.181 0.207 0.26 0.181 ...
## $ fractal_dimension_mean : num 0.0787 0.0567 0.06 0.0974 0.0588 ...
## $ radius_se : num 1.095 0.543 0.746 0.496 0.757 ...
## $ texture_se : num 0.905 0.734 0.787 1.156 0.781 ...
## $ perimeter_se : num 8.59 3.4 4.58 3.44 5.44 ...
## $ area_se : num 153.4 74.1 94 27.2 94.4 ...
## $ smoothness_se : num 0.0064 0.00522 0.00615 0.00911 0.01149 ...
## $ compactness_se : num 0.049 0.0131 0.0401 0.0746 0.0246 ...
## $ concavity_se : num 0.0537 0.0186 0.0383 0.0566 0.0569 ...
## $ concave.points_se : num 0.0159 0.0134 0.0206 0.0187 0.0188 ...
## $ symmetry_se : num 0.03 0.0139 0.0225 0.0596 0.0176 ...
## $ fractal_dimension_se : num 0.00619 0.00353 0.00457 0.00921 0.00511 ...
## $ radius_worst : num 25.4 25 23.6 14.9 22.5 ...
## $ texture_worst : num 17.3 23.4 25.5 26.5 16.7 ...
## $ perimeter_worst : num 184.6 158.8 152.5 98.9 152.2 ...
## $ area_worst : num 2019 1956 1709 568 1575 ...
## $ smoothness_worst : num 0.162 0.124 0.144 0.21 0.137 ...
## $ compactness_worst : num 0.666 0.187 0.424 0.866 0.205 ...
## $ concavity_worst : num 0.712 0.242 0.45 0.687 0.4 ...
## $ concave.points_worst : num 0.265 0.186 0.243 0.258 0.163 ...
## $ symmetry_worst : num 0.46 0.275 0.361 0.664 0.236 ...
## $ fractal_dimension_worst: num 0.1189 0.089 0.0876 0.173 0.0768 ...
## $ X : logi NA NA NA NA NA NA ...
#dimension of data set
dim(data)
## [1] 569 33
#summary of data set
summary(data)
## id diagnosis radius_mean texture_mean
## Min. : 8670 B:357 Min. : 6.981 Min. : 9.71
## 1st Qu.: 869218 M:212 1st Qu.:11.700 1st Qu.:16.17
## Median : 906024 Median :13.370 Median :18.84
## Mean : 30371831 Mean :14.127 Mean :19.29
## 3rd Qu.: 8813129 3rd Qu.:15.780 3rd Qu.:21.80
## Max. :911320502 Max. :28.110 Max. :39.28
## perimeter_mean area_mean smoothness_mean compactness_mean
## Min. : 43.79 Min. : 143.5 Min. :0.05263 Min. :0.01938
## 1st Qu.: 75.17 1st Qu.: 420.3 1st Qu.:0.08637 1st Qu.:0.06492
## Median : 86.24 Median : 551.1 Median :0.09587 Median :0.09263
## Mean : 91.97 Mean : 654.9 Mean :0.09636 Mean :0.10434
## 3rd Qu.:104.10 3rd Qu.: 782.7 3rd Qu.:0.10530 3rd Qu.:0.13040
## Max. :188.50 Max. :2501.0 Max. :0.16340 Max. :0.34540
## concavity_mean concave.points_mean symmetry_mean
## Min. :0.00000 Min. :0.00000 Min. :0.1060
## 1st Qu.:0.02956 1st Qu.:0.02031 1st Qu.:0.1619
## Median :0.06154 Median :0.03350 Median :0.1792
## Mean :0.08880 Mean :0.04892 Mean :0.1812
## 3rd Qu.:0.13070 3rd Qu.:0.07400 3rd Qu.:0.1957
## Max. :0.42680 Max. :0.20120 Max. :0.3040
## fractal_dimension_mean radius_se texture_se perimeter_se
## Min. :0.04996 Min. :0.1115 Min. :0.3602 Min. : 0.757
## 1st Qu.:0.05770 1st Qu.:0.2324 1st Qu.:0.8339 1st Qu.: 1.606
## Median :0.06154 Median :0.3242 Median :1.1080 Median : 2.287
## Mean :0.06280 Mean :0.4052 Mean :1.2169 Mean : 2.866
## 3rd Qu.:0.06612 3rd Qu.:0.4789 3rd Qu.:1.4740 3rd Qu.: 3.357
## Max. :0.09744 Max. :2.8730 Max. :4.8850 Max. :21.980
## area_se smoothness_se compactness_se concavity_se
## Min. : 6.802 Min. :0.001713 Min. :0.002252 Min. :0.00000
## 1st Qu.: 17.850 1st Qu.:0.005169 1st Qu.:0.013080 1st Qu.:0.01509
## Median : 24.530 Median :0.006380 Median :0.020450 Median :0.02589
## Mean : 40.337 Mean :0.007041 Mean :0.025478 Mean :0.03189
## 3rd Qu.: 45.190 3rd Qu.:0.008146 3rd Qu.:0.032450 3rd Qu.:0.04205
## Max. :542.200 Max. :0.031130 Max. :0.135400 Max. :0.39600
## concave.points_se symmetry_se fractal_dimension_se
## Min. :0.000000 Min. :0.007882 Min. :0.0008948
## 1st Qu.:0.007638 1st Qu.:0.015160 1st Qu.:0.0022480
## Median :0.010930 Median :0.018730 Median :0.0031870
## Mean :0.011796 Mean :0.020542 Mean :0.0037949
## 3rd Qu.:0.014710 3rd Qu.:0.023480 3rd Qu.:0.0045580
## Max. :0.052790 Max. :0.078950 Max. :0.0298400
## radius_worst texture_worst perimeter_worst area_worst
## Min. : 7.93 Min. :12.02 Min. : 50.41 Min. : 185.2
## 1st Qu.:13.01 1st Qu.:21.08 1st Qu.: 84.11 1st Qu.: 515.3
## Median :14.97 Median :25.41 Median : 97.66 Median : 686.5
## Mean :16.27 Mean :25.68 Mean :107.26 Mean : 880.6
## 3rd Qu.:18.79 3rd Qu.:29.72 3rd Qu.:125.40 3rd Qu.:1084.0
## Max. :36.04 Max. :49.54 Max. :251.20 Max. :4254.0
## smoothness_worst compactness_worst concavity_worst concave.points_worst
## Min. :0.07117 Min. :0.02729 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.11660 1st Qu.:0.14720 1st Qu.:0.1145 1st Qu.:0.06493
## Median :0.13130 Median :0.21190 Median :0.2267 Median :0.09993
## Mean :0.13237 Mean :0.25427 Mean :0.2722 Mean :0.11461
## 3rd Qu.:0.14600 3rd Qu.:0.33910 3rd Qu.:0.3829 3rd Qu.:0.16140
## Max. :0.22260 Max. :1.05800 Max. :1.2520 Max. :0.29100
## symmetry_worst fractal_dimension_worst X
## Min. :0.1565 Min. :0.05504 Mode:logical
## 1st Qu.:0.2504 1st Qu.:0.07146 NA's:569
## Median :0.2822 Median :0.08004
## Mean :0.2901 Mean :0.08395
## 3rd Qu.:0.3179 3rd Qu.:0.09208
## Max. :0.6638 Max. :0.20750
##remove na's
data<-data[-33]
summary(data)
## id diagnosis radius_mean texture_mean
## Min. : 8670 B:357 Min. : 6.981 Min. : 9.71
## 1st Qu.: 869218 M:212 1st Qu.:11.700 1st Qu.:16.17
## Median : 906024 Median :13.370 Median :18.84
## Mean : 30371831 Mean :14.127 Mean :19.29
## 3rd Qu.: 8813129 3rd Qu.:15.780 3rd Qu.:21.80
## Max. :911320502 Max. :28.110 Max. :39.28
## perimeter_mean area_mean smoothness_mean compactness_mean
## Min. : 43.79 Min. : 143.5 Min. :0.05263 Min. :0.01938
## 1st Qu.: 75.17 1st Qu.: 420.3 1st Qu.:0.08637 1st Qu.:0.06492
## Median : 86.24 Median : 551.1 Median :0.09587 Median :0.09263
## Mean : 91.97 Mean : 654.9 Mean :0.09636 Mean :0.10434
## 3rd Qu.:104.10 3rd Qu.: 782.7 3rd Qu.:0.10530 3rd Qu.:0.13040
## Max. :188.50 Max. :2501.0 Max. :0.16340 Max. :0.34540
## concavity_mean concave.points_mean symmetry_mean
## Min. :0.00000 Min. :0.00000 Min. :0.1060
## 1st Qu.:0.02956 1st Qu.:0.02031 1st Qu.:0.1619
## Median :0.06154 Median :0.03350 Median :0.1792
## Mean :0.08880 Mean :0.04892 Mean :0.1812
## 3rd Qu.:0.13070 3rd Qu.:0.07400 3rd Qu.:0.1957
## Max. :0.42680 Max. :0.20120 Max. :0.3040
## fractal_dimension_mean radius_se texture_se perimeter_se
## Min. :0.04996 Min. :0.1115 Min. :0.3602 Min. : 0.757
## 1st Qu.:0.05770 1st Qu.:0.2324 1st Qu.:0.8339 1st Qu.: 1.606
## Median :0.06154 Median :0.3242 Median :1.1080 Median : 2.287
## Mean :0.06280 Mean :0.4052 Mean :1.2169 Mean : 2.866
## 3rd Qu.:0.06612 3rd Qu.:0.4789 3rd Qu.:1.4740 3rd Qu.: 3.357
## Max. :0.09744 Max. :2.8730 Max. :4.8850 Max. :21.980
## area_se smoothness_se compactness_se concavity_se
## Min. : 6.802 Min. :0.001713 Min. :0.002252 Min. :0.00000
## 1st Qu.: 17.850 1st Qu.:0.005169 1st Qu.:0.013080 1st Qu.:0.01509
## Median : 24.530 Median :0.006380 Median :0.020450 Median :0.02589
## Mean : 40.337 Mean :0.007041 Mean :0.025478 Mean :0.03189
## 3rd Qu.: 45.190 3rd Qu.:0.008146 3rd Qu.:0.032450 3rd Qu.:0.04205
## Max. :542.200 Max. :0.031130 Max. :0.135400 Max. :0.39600
## concave.points_se symmetry_se fractal_dimension_se
## Min. :0.000000 Min. :0.007882 Min. :0.0008948
## 1st Qu.:0.007638 1st Qu.:0.015160 1st Qu.:0.0022480
## Median :0.010930 Median :0.018730 Median :0.0031870
## Mean :0.011796 Mean :0.020542 Mean :0.0037949
## 3rd Qu.:0.014710 3rd Qu.:0.023480 3rd Qu.:0.0045580
## Max. :0.052790 Max. :0.078950 Max. :0.0298400
## radius_worst texture_worst perimeter_worst area_worst
## Min. : 7.93 Min. :12.02 Min. : 50.41 Min. : 185.2
## 1st Qu.:13.01 1st Qu.:21.08 1st Qu.: 84.11 1st Qu.: 515.3
## Median :14.97 Median :25.41 Median : 97.66 Median : 686.5
## Mean :16.27 Mean :25.68 Mean :107.26 Mean : 880.6
## 3rd Qu.:18.79 3rd Qu.:29.72 3rd Qu.:125.40 3rd Qu.:1084.0
## Max. :36.04 Max. :49.54 Max. :251.20 Max. :4254.0
## smoothness_worst compactness_worst concavity_worst concave.points_worst
## Min. :0.07117 Min. :0.02729 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.11660 1st Qu.:0.14720 1st Qu.:0.1145 1st Qu.:0.06493
## Median :0.13130 Median :0.21190 Median :0.2267 Median :0.09993
## Mean :0.13237 Mean :0.25427 Mean :0.2722 Mean :0.11461
## 3rd Qu.:0.14600 3rd Qu.:0.33910 3rd Qu.:0.3829 3rd Qu.:0.16140
## Max. :0.22260 Max. :1.05800 Max. :1.2520 Max. :0.29100
## symmetry_worst fractal_dimension_worst
## Min. :0.1565 Min. :0.05504
## 1st Qu.:0.2504 1st Qu.:0.07146
## Median :0.2822 Median :0.08004
## Mean :0.2901 Mean :0.08395
## 3rd Qu.:0.3179 3rd Qu.:0.09208
## Max. :0.6638 Max. :0.20750
data %>% count(diagnosis)
## # A tibble: 2 x 2
## diagnosis n
## <fct> <int>
## 1 B 357
## 2 M 212
data %>% count(diagnosis)%>%group_by(diagnosis) %>%
summarize(perc_dx = round((n / 569)* 100, 2))
## # A tibble: 2 x 2
## diagnosis perc_dx
## <fct> <dbl>
## 1 B 62.7
## 2 M 37.3
diagnosis.table <- table(data$diagnosis)
colors <- terrain.colors(2)
# Create a pie chart
diagnosis.prop.table <- prop.table(diagnosis.table)*100
diagnosis.prop.df <- as.data.frame(diagnosis.prop.table)
pielabels <- sprintf("%s - %3.1f%s", diagnosis.prop.df[,1], diagnosis.prop.table, "%")
pie(diagnosis.prop.table,
labels=pielabels,
clockwise=TRUE,
col=colors,
border="gainsboro",
radius=0.8,
cex=0.8,
main="frequency of cancer diagnosis")
legend(1, .4, legend=diagnosis.prop.df[,1], cex = 0.7, fill = colors)
library(ggplot2)
ggplot(data=data,aes(x=diagnosis,y=radius_mean,fill="pink"))+geom_boxplot()+ggtitle("radius of Benign Vs Malignant")
ggplot(data=data,aes(x=diagnosis,y=area_mean))+geom_boxplot()+ggtitle("area of Benign Vs Malignant")
ggplot(data=data,aes(x=diagnosis,y=concavity_mean))+geom_boxplot()+ggtitle("concavity of Benign Vs Malignant")
we came to know that malignant cells have higher radius,area and concavity mean than benign cell
ggplot(data,aes(x=diagnosis,fill=texture_mean))+geom_bar()+ggtitle("women affected in benign and malingnant stage")
sel_data=data[data$radius_mean>10&
data$radius_mean<15&
data$compactness_mean>0.1,]
ggplot(sel_data,aes(x=diagnosis,y=radius_mean,fill=diagnosis))+geom_col()+ggtitle("womens affected in higher levels based on mean")
ggplot(data,aes(x=texture_mean,fill=as.factor(diagnosis)))+geom_density(alpha=0.4)+ggtitle(" texture mean for benign vs malignant")
ggplot(data,aes(x=as.factor(diagnosis),y=perimeter_mean))+geom_violin()+ggtitle(" perimeter mean for benign vs malignant")
data1=data%>%filter(concavity_mean>0.2)
ggplot(data1,aes(x=concavity_mean,y=diagnosis,size=perimeter_se))+geom_point()+ggtitle("concavity mean for benign vs malignant")
ggplot(data, aes(x = area_se>15, fill = diagnosis)) +geom_bar(position = "fill")+ggtitle("area se for benign vs malignant")
ggplot(data,aes(x=concavity_mean,fill=diagnosis))+geom_histogram(binwidth=10)+ggtitle(" concavity mean for benign vs malignant")
ggplot(data, aes(x = texture_se)) +
geom_histogram(binwidth=10) +
facet_wrap(~ diagnosis)+ggtitle(" texture se for benign vs malignant")
ggplot(data, aes(x = perimeter_mean)) +
geom_histogram(binwidth=10) +
facet_wrap(~ diagnosis)+ggtitle(" perimeter mean for benign vs malignant")
In this section I will:
1.Train the algorithm on the first part,
2.make predictions on the second part and
3.evaluate the predictions against the expected results.
library(caTools)
data$diagnosis<-factor(data$diagnosis,levels=c("B","M"),labels=c(0,1))
set.seed(123)
split=sample.split(data$diagnosis,SplitRatio=0.65)
data<-data[-33]
training_set<-subset(data,split==T)
View(training_set)
test_set<-subset(data,split==F)
View(test_set)
training_set[,3:32]<-scale(training_set[,3:32])
View(training_set)
test_set[,3:32]<-scale(test_set[,3:32])
View(test_set)
reg<-glm(formula=diagnosis~ .,family=quasibinomial(),data=training_set)
## Warning: glm.fit: algorithm did not converge
summary(reg)
##
## Call:
## glm(formula = diagnosis ~ ., family = quasibinomial(), data = training_set)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -9.429e-05 -2.100e-08 -2.100e-08 2.100e-08 1.208e-04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.449e+00 8.343e-01 -4.134 4.50e-05 ***
## id 1.082e-07 1.980e-09 54.641 < 2e-16 ***
## radius_mean -3.229e+02 2.003e+01 -16.120 < 2e-16 ***
## texture_mean 4.486e+01 5.274e-01 85.049 < 2e-16 ***
## perimeter_mean 5.784e+02 1.267e+01 45.658 < 2e-16 ***
## area_mean -3.233e+02 1.237e+01 -26.145 < 2e-16 ***
## smoothness_mean 3.435e+01 4.869e-01 70.548 < 2e-16 ***
## compactness_mean -1.725e+02 1.612e+00 -107.061 < 2e-16 ***
## concavity_mean -2.358e+01 2.002e+00 -11.777 < 2e-16 ***
## concave.points_mean 1.083e+02 3.078e+00 35.178 < 2e-16 ***
## symmetry_mean -4.042e+01 7.221e-01 -55.976 < 2e-16 ***
## fractal_dimension_mean 2.238e+00 6.437e-01 3.477 0.000574 ***
## radius_se 1.242e+02 3.505e+00 35.431 < 2e-16 ***
## texture_se -1.402e+00 4.364e-01 -3.213 0.001439 **
## perimeter_se -3.235e+01 2.090e+00 -15.477 < 2e-16 ***
## area_se -3.336e+01 4.455e+00 -7.489 6.09e-13 ***
## smoothness_se -2.412e+01 4.684e-01 -51.503 < 2e-16 ***
## compactness_se 5.370e+01 8.409e-01 63.855 < 2e-16 ***
## concavity_se -9.459e+01 9.038e-01 -104.654 < 2e-16 ***
## concave.points_se 1.017e+02 9.083e-01 111.962 < 2e-16 ***
## symmetry_se -7.320e+00 5.140e-01 -14.241 < 2e-16 ***
## fractal_dimension_se -6.932e+01 9.562e-01 -72.497 < 2e-16 ***
## radius_worst -2.509e+02 1.202e+01 -20.874 < 2e-16 ***
## texture_worst 1.240e+01 6.738e-01 18.404 < 2e-16 ***
## perimeter_worst 6.154e+01 7.704e+00 7.988 2.17e-14 ***
## area_worst 4.062e+02 9.885e+00 41.093 < 2e-16 ***
## smoothness_worst 1.854e+01 5.165e-01 35.893 < 2e-16 ***
## compactness_worst -4.216e+01 1.384e+00 -30.453 < 2e-16 ***
## concavity_worst 1.342e+02 1.768e+00 75.941 < 2e-16 ***
## concave.points_worst -4.460e+01 2.720e+00 -16.400 < 2e-16 ***
## symmetry_worst 5.770e+01 1.080e+00 53.450 < 2e-16 ***
## fractal_dimension_worst 4.384e+01 9.737e-01 45.031 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 4.936336e-10)
##
## Null deviance: 4.8878e+02 on 369 degrees of freedom
## Residual deviance: 1.2279e-07 on 338 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 25
prob_pred<-predict(object=reg,type="response",newdata=test_set[-2])
View(prob_pred)
y_pred<-ifelse(prob_pred>0.5,1,0)
View(y_pred)
tab<-table(test_set[,2],y_pred)
tab
## y_pred
## 0 1
## 0 121 4
## 1 5 69
acc<-sum(diag(tab))/sum(tab)
acc
## [1] 0.9547739
err<-1-acc
err
## [1] 0.04522613
The feature analysis show that there are few features with more predictive value for the diagnosis. We have found a model based on neural network and preprocessed data with good results over the test set. This model has a sensitivity of 0.954.