library(readxl)
## Warning: package 'readxl' was built under R version 3.5.3
ctg <- read_xlsx("F:/Machine Learning/Data Science/Machine Learning/Multinomial/CTG.xlsx")
head(ctg)
## # A tibble: 6 x 22
## LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 120 0 0 0 0 0 0 73 0.5 43 2.4
## 2 132 0.00638 0 0.00638 0.00319 0 0 17 2.1 0 10.4
## 3 133 0.00332 0 0.00831 0.00332 0 0 16 2.1 0 13.4
## 4 134 0.00256 0 0.00768 0.00256 0 0 16 2.4 0 23
## 5 132 0.00651 0 0.00814 0 0 0 16 2.4 0 19.9
## 6 134 0.00105 0 0.0105 0.00944 0 0.00210 26 5.9 0 0
## # ... with 11 more variables: Width <dbl>, Min <dbl>, Max <dbl>,
## # Nmax <dbl>, Nzeros <dbl>, Mode <dbl>, Mean <dbl>, Median <dbl>,
## # Variance <dbl>, Tendency <dbl>, NSP <dbl>
table(ctg$NSP)
##
## 1 2 3
## 1655 295 176
1 = normal 2 = suspect 3 = pathology
str(ctg)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2126 obs. of 22 variables:
## $ LB : num 120 132 133 134 132 134 134 122 122 122 ...
## $ AC : num 0 0.00638 0.00332 0.00256 0.00651 ...
## $ FM : num 0 0 0 0 0 0 0 0 0 0 ...
## $ UC : num 0 0.00638 0.00831 0.00768 0.00814 ...
## $ DL : num 0 0.00319 0.00332 0.00256 0 ...
## $ DS : num 0 0 0 0 0 0 0 0 0 0 ...
## $ DP : num 0 0 0 0 0 ...
## $ ASTV : num 73 17 16 16 16 26 29 83 84 86 ...
## $ MSTV : num 0.5 2.1 2.1 2.4 2.4 5.9 6.3 0.5 0.5 0.3 ...
## $ ALTV : num 43 0 0 0 0 0 0 6 5 6 ...
## $ MLTV : num 2.4 10.4 13.4 23 19.9 0 0 15.6 13.6 10.6 ...
## $ Width : num 64 130 130 117 117 150 150 68 68 68 ...
## $ Min : num 62 68 68 53 53 50 50 62 62 62 ...
## $ Max : num 126 198 198 170 170 200 200 130 130 130 ...
## $ Nmax : num 2 6 5 11 9 5 6 0 0 1 ...
## $ Nzeros : num 0 1 1 0 0 3 3 0 0 0 ...
## $ Mode : num 120 141 141 137 137 76 71 122 122 122 ...
## $ Mean : num 137 136 135 134 136 107 107 122 122 122 ...
## $ Median : num 121 140 138 137 138 107 106 123 123 123 ...
## $ Variance: num 73 12 13 13 11 170 215 3 3 1 ...
## $ Tendency: num 1 0 0 1 1 0 0 1 1 1 ...
## $ NSP : num 2 1 1 1 1 3 3 3 3 3 ...
All variables are in numerical
The target variable, NSP, is supposed to be categorical. So we need to convert this variable as factor variable.
ctg.mydata <- ctg
#convert into factor variable
ctg.mydata$NSP <- as.factor(ctg.mydata$NSP)
str(ctg.mydata)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2126 obs. of 22 variables:
## $ LB : num 120 132 133 134 132 134 134 122 122 122 ...
## $ AC : num 0 0.00638 0.00332 0.00256 0.00651 ...
## $ FM : num 0 0 0 0 0 0 0 0 0 0 ...
## $ UC : num 0 0.00638 0.00831 0.00768 0.00814 ...
## $ DL : num 0 0.00319 0.00332 0.00256 0 ...
## $ DS : num 0 0 0 0 0 0 0 0 0 0 ...
## $ DP : num 0 0 0 0 0 ...
## $ ASTV : num 73 17 16 16 16 26 29 83 84 86 ...
## $ MSTV : num 0.5 2.1 2.1 2.4 2.4 5.9 6.3 0.5 0.5 0.3 ...
## $ ALTV : num 43 0 0 0 0 0 0 6 5 6 ...
## $ MLTV : num 2.4 10.4 13.4 23 19.9 0 0 15.6 13.6 10.6 ...
## $ Width : num 64 130 130 117 117 150 150 68 68 68 ...
## $ Min : num 62 68 68 53 53 50 50 62 62 62 ...
## $ Max : num 126 198 198 170 170 200 200 130 130 130 ...
## $ Nmax : num 2 6 5 11 9 5 6 0 0 1 ...
## $ Nzeros : num 0 1 1 0 0 3 3 0 0 0 ...
## $ Mode : num 120 141 141 137 137 76 71 122 122 122 ...
## $ Mean : num 137 136 135 134 136 107 107 122 122 122 ...
## $ Median : num 121 140 138 137 138 107 106 123 123 123 ...
## $ Variance: num 73 12 13 13 11 170 215 3 3 1 ...
## $ Tendency: num 1 0 0 1 1 0 0 1 1 1 ...
## $ NSP : Factor w/ 3 levels "1","2","3": 2 1 1 1 1 3 3 3 3 3 ...
One other thing to note here is that since the data has three levels in categorical variable, we have to identify the reference level. So we are going to identify ‘1’ as a reference level. Hence, ‘1’ will mean normal patient
#relevel
ctg.mydata$out <- relevel(ctg.mydata$NSP, ref='1')
Now the data is ready to develop multinomial logistic regression.
#multinomial logistic regression
library(nnet)
mult.model <- multinom(out ~ LB+AC+FM, data = ctg.mydata)
## # weights: 15 (8 variable)
## initial value 2335.649726
## iter 10 value 1289.818570
## iter 20 value 1041.240745
## iter 30 value 1036.259306
## final value 1019.987651
## converged
The results show that it started with very high error and after 10 iterations it reduced to 1289 from 2335 and so on and finally it converges to a lower value 1019.
summary(mult.model)
## Call:
## multinom(formula = out ~ LB + AC + FM, data = ctg.mydata)
##
## Coefficients:
## (Intercept) LB AC FM
## 2 -16.2182977 0.112918884 -829.1624 6.137294
## 3 -0.4208593 -0.006730701 -789.8815 8.231495
##
## Std. Errors:
## (Intercept) LB AC FM
## 2 1.261066 0.009050217 0.005518354 1.746013
## 3 1.212057 0.009170115 0.009872315 1.372880
##
## Residual Deviance: 2039.975
## AIC: 2055.975
2 means second level, remember 1 is the first level. So 2 is referring to suspect and 3 means pathology. For 2, we have -16.2182977 and the first coefficient for LB is 0.1129 which is positive. For AC, -829.16, it is negative.
Once we have the right coefficeints and once we have conducted a two-tail z-test and we know that which features are significant we can retain them and remove the features that are not statistically significant and then find out y1 and y2 and then use them for calculating the three probabilities
So we can calculate probabilities in r using ‘predict’ command
#equations for calculating probabilities
#predict
pred <- predict(mult.model, ctg.mydata)
head(pred)
## [1] 1 1 1 1 1 1
## Levels: 1 2 3
#prediction of probabilities
prob <- predict(mult.model, ctg.mydata, type = "prob")
head(prob)
## 1 2 3
## 1 0.7341566 0.050942149 0.214901291
## 2 0.9969034 0.001352475 0.001744078
## 3 0.9628285 0.018450596 0.018720923
## 4 0.9297324 0.037501998 0.032765616
## 5 0.9972224 0.001209563 0.001568084
## 6 0.7951989 0.112322028 0.092479056
What we see is for every patient there are three probabilities. For example, for the first patient being normal is .734 so there is a 73.4% chance of patient being normal is the highest so we will predict that this patient is going to be normal.
When we add three probabilities we will obviously get 1.
#prediction of probabilities only for some specific patients
predict(mult.model, ctg.mydata[c(4,1020,876), ], type = "prob")
## 1 2 3
## 1 0.9297324 0.03750200 0.03276562
## 2 0.9608279 0.02175073 0.01742138
## 3 0.9152839 0.04786749 0.03684863
At 1020 patient, there is a 96% chance that the patient being normal.
#misclassification error
library(caret)
## Warning: package 'caret' was built under R version 3.5.3
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.5.3
tab <- table(pred, ctg.mydata$NSP)
tab
##
## pred 1 2 3
## 1 1592 165 137
## 2 61 128 27
## 3 2 2 12
1-sum(diag(tab))/sum(tab)
## [1] 0.1853246
Compare the prediction of model with actual data to see how often the match are not there. So the misclassification error is 18.5%. So 18.5% of the times the model misclassifies a patient.
#two-tailed z-test
z <- summary(mult.model)$coefficients/summary(mult.model)$standard.errors
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
## (Intercept) LB AC FM
## 2 0.0000000 0.0000000 0 4.396983e-04
## 3 0.7284205 0.4629596 0 2.025025e-09
We can see the coefficient for LB has a p-value of 0.0000000 coeff for LB is almost zero when the p-value is small obviously the confidence level is high because confidence level is 1-0.0000000(LB), so the confidence level is close to 100%, so this variable is playing a significant role.
In this case, we used class 1 as the reference level which is classified for ‘normal’; 2 is classifed for ‘suspect’; 3 is classified for ‘pathologic’.
LB for third classification, p-value of 0.4629596, 1-.46 = 0.54 so confidence level would be about 54% obviously that’s too small. We look for like 90%, 95% or 99% confidence.
So this means LB doesn’t have significant contribution in the model when 1 is reference and when we are looking at third level for the response.
For AC, both the p-values are small so obviously AC is sigificant.
Also for FM, we have like 4.396983e-04 (4.3 times 10 to the power of negative 4) for class 2; and 2.025025e-09 (2.0 times 10 to the power of negative 9) for class 3, so these coefficients are significant.
In fact in this case we can use all three because they are significant either for both or at least for one.