Data Set Information: 2126 fetal cardiotocograms (CTGs) were automatically processed and the respective diagnostic features measured. The CTGs were also classified by three expert obstetricians and a consensus classification label assigned to each of them. NSP: 1. Normal, 2. Suspected, 3. pathology.
data <- read.csv("file:///C:/Users/badal/Desktop/datset_/Cardiotocographic.csv")
head(data)
str(data)
'data.frame': 2126 obs. of 22 variables:
$ LB : int 120 132 133 134 132 134 134 122 122 122 ...
$ AC : num 0 0.00638 0.00332 0.00256 0.00651 ...
$ FM : num 0 0 0 0 0 0 0 0 0 0 ...
$ UC : num 0 0.00638 0.00831 0.00768 0.00814 ...
$ DL : num 0 0.00319 0.00332 0.00256 0 ...
$ DS : num 0 0 0 0 0 0 0 0 0 0 ...
$ DP : num 0 0 0 0 0 ...
$ ASTV : int 73 17 16 16 16 26 29 83 84 86 ...
$ MSTV : num 0.5 2.1 2.1 2.4 2.4 5.9 6.3 0.5 0.5 0.3 ...
$ ALTV : int 43 0 0 0 0 0 0 6 5 6 ...
$ MLTV : num 2.4 10.4 13.4 23 19.9 0 0 15.6 13.6 10.6 ...
$ Width : int 64 130 130 117 117 150 150 68 68 68 ...
$ Min : int 62 68 68 53 53 50 50 62 62 62 ...
$ Max : int 126 198 198 170 170 200 200 130 130 130 ...
$ Nmax : int 2 6 5 11 9 5 6 0 0 1 ...
$ Nzeros : int 0 1 1 0 0 3 3 0 0 0 ...
$ Mode : int 120 141 141 137 137 76 71 122 122 122 ...
$ Mean : int 137 136 135 134 136 107 107 122 122 122 ...
$ Median : int 121 140 138 137 138 107 106 123 123 123 ...
$ Variance: int 73 12 13 13 11 170 215 3 3 1 ...
$ Tendency: int 1 0 0 1 1 0 0 1 1 1 ...
$ NSP : int 2 1 1 1 1 3 3 3 3 3 ...
summary(data)
LB AC FM UC DL
Min. :106.0 Min. :0.000000 Min. :0.000000 Min. :0.000000 Min. :0.000000
1st Qu.:126.0 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.001876 1st Qu.:0.000000
Median :133.0 Median :0.001630 Median :0.000000 Median :0.004482 Median :0.000000
Mean :133.3 Mean :0.003170 Mean :0.009474 Mean :0.004357 Mean :0.001885
3rd Qu.:140.0 3rd Qu.:0.005631 3rd Qu.:0.002512 3rd Qu.:0.006525 3rd Qu.:0.003264
Max. :160.0 Max. :0.019284 Max. :0.480634 Max. :0.014925 Max. :0.015385
DS DP ASTV MSTV ALTV
Min. :0.000e+00 Min. :0.0000000 Min. :12.00 Min. :0.200 Min. : 0.000
1st Qu.:0.000e+00 1st Qu.:0.0000000 1st Qu.:32.00 1st Qu.:0.700 1st Qu.: 0.000
Median :0.000e+00 Median :0.0000000 Median :49.00 Median :1.200 Median : 0.000
Mean :3.585e-06 Mean :0.0001566 Mean :46.99 Mean :1.333 Mean : 9.847
3rd Qu.:0.000e+00 3rd Qu.:0.0000000 3rd Qu.:61.00 3rd Qu.:1.700 3rd Qu.:11.000
Max. :1.353e-03 Max. :0.0053476 Max. :87.00 Max. :7.000 Max. :91.000
MLTV Width Min Max Nmax Nzeros
Min. : 0.000 Min. : 3.00 Min. : 50.00 Min. :122 Min. : 0.000 Min. : 0.0000
1st Qu.: 4.600 1st Qu.: 37.00 1st Qu.: 67.00 1st Qu.:152 1st Qu.: 2.000 1st Qu.: 0.0000
Median : 7.400 Median : 67.50 Median : 93.00 Median :162 Median : 3.000 Median : 0.0000
Mean : 8.188 Mean : 70.45 Mean : 93.58 Mean :164 Mean : 4.068 Mean : 0.3236
3rd Qu.:10.800 3rd Qu.:100.00 3rd Qu.:120.00 3rd Qu.:174 3rd Qu.: 6.000 3rd Qu.: 0.0000
Max. :50.700 Max. :180.00 Max. :159.00 Max. :238 Max. :18.000 Max. :10.0000
Mode Mean Median Variance Tendency NSP
Min. : 60.0 Min. : 73.0 Min. : 77.0 Min. : 0.00 Min. :-1.0000 Min. :1.000
1st Qu.:129.0 1st Qu.:125.0 1st Qu.:129.0 1st Qu.: 2.00 1st Qu.: 0.0000 1st Qu.:1.000
Median :139.0 Median :136.0 Median :139.0 Median : 7.00 Median : 0.0000 Median :1.000
Mean :137.5 Mean :134.6 Mean :138.1 Mean : 18.81 Mean : 0.3203 Mean :1.304
3rd Qu.:148.0 3rd Qu.:145.0 3rd Qu.:148.0 3rd Qu.: 24.00 3rd Qu.: 1.0000 3rd Qu.:1.000
Max. :187.0 Max. :182.0 Max. :186.0 Max. :269.00 Max. : 1.0000 Max. :3.000
any(is.na(data))
[1] FALSE
data$NSP <- factor(data$NSP)
partition data into traning and validation sets
set.seed(1234)
index <- sample(2, nrow(data), replace = T, prob = c(0.80, 0.20))
train <- data[index==1,]
validate <- data[index==2,]
library(party)
package 㤼㸱party㤼㸲 was built under R version 3.6.1Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
package 㤼㸱strucchange㤼㸲 was built under R version 3.6.1Loading required package: zoo
package 㤼㸱zoo㤼㸲 was built under R version 3.6.1
Attaching package: 㤼㸱zoo㤼㸲
The following objects are masked from 㤼㸱package:base㤼㸲:
as.Date, as.Date.numeric
Loading required package: sandwich
package 㤼㸱sandwich㤼㸲 was built under R version 3.6.1
tree <- ctree(NSP~LB+AC+FM, data = train, controls = ctree_control(mincriterion=0.99, minsplit=500))
tree
Conditional inference tree with 5 terminal nodes
Response: NSP
Inputs: LB, AC, FM
Number of observations: 1718
1) AC <= 0.000834028; criterion = 1, statistic = 263.403
2) LB <= 136; criterion = 1, statistic = 131.511
3)* weights = 416
2) LB > 136
4)* weights = 314
1) AC > 0.000834028
5) AC <= 0.002209945; criterion = 1, statistic = 52.155
6)* weights = 188
5) AC > 0.002209945
7) LB <= 110; criterion = 1, statistic = 18.889
8)* weights = 18
7) LB > 110
9)* weights = 782
plot(tree)
Predict
pred <- predict(tree,validate)
pred
[1] 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1
[50] 1 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 1 1 1 2 2 1 1 2 2 2 1 2 2 1 2 2 1 2 2 1 1 1 2 2 1 1 1 1 1 1 1
[99] 1 1 1 2 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1
[148] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 2 1 2 2 2 2 1 2 1 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
[197] 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1
[246] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 2 1 1 1 2 1 1 1 1 2 2 2
[295] 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[344] 1 1 1 1 1 1 2 2 1 1 2 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[393] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
Levels: 1 2 3
Decision tree with R-part
library(rpart)
library(rpart.plot)
package 㤼㸱rpart.plot㤼㸲 was built under R version 3.6.1
tree1 <- rpart(NSP~LB+AC+FM, train)
rpart.plot(tree1)
rpart.plot(tree1, extra = 1)
rpart.plot(tree1, extra = 2)
pred2 <- predict(tree1, validate)
head(pred2,15)
1 2 3
5 0.9453441 0.03340081 0.02125506
14 0.9453441 0.03340081 0.02125506
16 0.9453441 0.03340081 0.02125506
26 0.7111111 0.09135802 0.19753086
28 0.7111111 0.09135802 0.19753086
29 0.7111111 0.09135802 0.19753086
39 0.9453441 0.03340081 0.02125506
40 0.9453441 0.03340081 0.02125506
60 0.9453441 0.03340081 0.02125506
61 0.9453441 0.03340081 0.02125506
72 0.9453441 0.03340081 0.02125506
81 0.2584270 0.60674157 0.13483146
86 0.9453441 0.03340081 0.02125506
90 0.9453441 0.03340081 0.02125506
92 0.3523810 0.59047619 0.05714286
table <- table(predict(tree), train$NSP)
table
1 2 3
1 1222 70 112
2 126 156 32
3 0 0 0
sum(diag(table))/sum(table)
[1] 0.8020955
1-sum(diag(table))/sum(table)
[1] 0.1979045
misclassification in validation set
testpred <- predict(tree, newdata = validate)
tab <- table(testpred, validate$NSP)
tab
testpred 1 2 3
1 274 21 28
2 33 48 4
3 0 0 0
1-sum(diag(tab))/sum(tab)
[1] 0.2107843