Loading Data
| LB | AC | FM | UC | DL | DS | DP | ASTV | MSTV | ALTV | MLTV | Width | Min | Max | Nmax | Nzeros | Mode | Mean | Median | Variance | Tendency | NSP |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 120 | 0.000000000 | 0 | 0.000000000 | 0.000000000 | 0 | 0.000000000 | 73 | 0.5 | 43 | 2.4 | 64 | 62 | 126 | 2 | 0 | 120 | 137 | 121 | 73 | 1 | 2 |
| 132 | 0.006379585 | 0 | 0.006379585 | 0.003189793 | 0 | 0.000000000 | 17 | 2.1 | 0 | 10.4 | 130 | 68 | 198 | 6 | 1 | 141 | 136 | 140 | 12 | 0 | 1 |
| 133 | 0.003322259 | 0 | 0.008305648 | 0.003322259 | 0 | 0.000000000 | 16 | 2.1 | 0 | 13.4 | 130 | 68 | 198 | 5 | 1 | 141 | 135 | 138 | 13 | 0 | 1 |
| 134 | 0.002560819 | 0 | 0.007682458 | 0.002560819 | 0 | 0.000000000 | 16 | 2.4 | 0 | 23.0 | 117 | 53 | 170 | 11 | 0 | 137 | 134 | 137 | 13 | 1 | 1 |
| 132 | 0.006514658 | 0 | 0.008143322 | 0.000000000 | 0 | 0.000000000 | 16 | 2.4 | 0 | 19.9 | 117 | 53 | 170 | 9 | 0 | 137 | 136 | 138 | 11 | 1 | 1 |
| 134 | 0.001049318 | 0 | 0.010493179 | 0.009443861 | 0 | 0.002098636 | 26 | 5.9 | 0 | 0.0 | 150 | 50 | 200 | 5 | 3 | 76 | 107 | 107 | 170 | 0 | 3 |
Attribute Information about the CTG dataset
LB: beats per minute AC: Number of accelerations per second FM: Number of uterine contractions per second DL: Number of light decelerations per second DS: Number of severe decelerations per second DP: Number of prolonged decelerations per second ASTV: Percentage of time with abnormal short term variability MSTV: Mean value of short term variability ALTV: Percentage of time with abnormal long term variability MLTV: Mean value of long term variability
Width: Width of FHR(Fetal Heart Rate) histogram Min: Minimum of FHR histogram Max: Maximum of FHR histogram Nmax: Number of histogram peaks Nzeros: Number of histogram zeros Mode: Histogram mode Mean: Histogram mean Median: Histogram median Variance: Histogram variance Tendency: Histogram tendency
NSP(Normal(N) = 1; Suspecios(S) = 2; Pathology(P) = 3):
Fetal state class node
Correlation Matrix
As we observed from the correlation Matrix there is somewhat strong correlation between Width of FHR Histogram and DL, So let’s visualize the variability
Data is clustered in a mess way so we need to apply some transformation
Not a good candidate for linear modeling even after log transform
Now we are checking relation between Width and MSTV
Good candidate for linear regression
Studying the most risk factors affecting class NSP
## n= 2126
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 2126 802.100200 1.304327
## 2) Mean>=107.5 2021 520.754100 1.227115
## 4) MSTV>=0.45 1770 208.454200 1.105085
## 8) DP< 0.002064161 1757 177.126900 1.095048
## 16) MSTV>=0.55 1636 110.515300 1.062958
## 32) AC>=0.001845025 976 5.963115 1.006148 *
## 33) AC< 0.001845025 660 96.743940 1.146970
## 66) Median< 150.5 572 55.138110 1.082168 *
## 67) Median>=150.5 88 23.590910 1.568182 *
## 17) MSTV< 0.55 121 42.148760 1.528926
## 34) ASTV< 59.5 66 11.030300 1.212121 *
## 35) ASTV>=59.5 55 16.545450 1.909091 *
## 9) DP>=0.002064161 13 7.230769 2.461538 *
## 5) MSTV< 0.45 251 100.071700 2.087649
## 10) ALTV< 68.5 210 59.623810 1.919048
## 20) ASTV< 79.5 188 33.207450 1.824468
## 40) ALTV< 7.5 21 1.809524 1.095238 *
## 41) ALTV>=7.5 167 18.826350 1.916168 *
## 21) ASTV>=79.5 22 10.363640 2.727273 *
## 11) ALTV>=68.5 41 3.902439 2.951220 *
## 3) Mean< 107.5 105 37.390480 2.790476
## 6) Max>=220.5 9 3.555556 1.222222 *
## 7) Max< 220.5 96 9.625000 2.937500 *
We got MSTV,ALTV & ASTV as the high risk factors,so we will build our logistic regression model based on these factors.
We will create a column P(Pathology) of binary family based on if class NSP is Pathology or not.
Now we imputed successfully our new classifier variable.
Logistic Classification regression Model
##
## Call:
## glm(formula = P ~ MSTV + ALTV + ASTV, family = "binomial", data = p_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.855005 0.745989 -14.551 < 2e-16 ***
## MSTV 1.231484 0.114867 10.721 < 2e-16 ***
## ALTV 0.026689 0.004759 5.608 2.05e-08 ***
## ASTV 0.111116 0.010182 10.913 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 976.84 on 1699 degrees of freedom
## Residual deviance: 663.81 on 1696 degrees of freedom
## AIC: 671.81
##
## Number of Fisher Scoring iterations: 7
All very significant predictors for our model (P-value<<)
Decision Tree Model for NSP Class
Model1
As there exists a strong correlation between ASTV,MSTV & ALTV with NSP class, we will discuss these relations in our model tree
tree_model1 <- rpart(NSP ~ ASTV+ALTV+MSTV,data=p_train, method = "class")
rpart.plot(tree_model1, main = "Decision Tree for NSP Classification")Model2
Reffering to the correlation matrix also there exists strong negative correlations between AC & UC with NSP Class, Visually:
tree_model2 <- rpart(NSP ~ AC+UC,data=p_train, method = "class")
rpart.plot(tree_model2, main = "Decision Tree for NSP Classification")test2 <-predict(tree_model2,
p_test,
interval="prediction")
RSME <- sqrt(mean((p_test$NSP - predict(tree_model2, p_test))^2))## [1] "RSME of tree model2:"
## [1] 1.197712
A little bit smaller RSME than model 1 so not actually better fit than it.
Model3
Adding all predictors to the same model
tree_model3 <- rpart(NSP ~ ASTV+MSTV+ALTV+AC+UC,data=p_train, method = "class")
rpart.plot(tree_model3, main = "Decision Tree for NSP Classification")test3<-predict(tree_model3,
p_test,
interval="prediction")
RSME <- sqrt(mean((p_test$NSP - predict(tree_model3, p_test))^2))## [1] "RSME of tree model3:"
## [1] 1.20678
Somewhat higher RSME but more fit still to study all the correlated variables in order to make our decision