library(readxl)
## Warning: package 'readxl' was built under R version 3.5.3
ctg <- read_xlsx("F:/Machine Learning/Data Science/Machine Learning/Multinomial/CTG.xlsx")
head(ctg)
## # A tibble: 6 x 22
##      LB      AC    FM      UC      DL    DS      DP  ASTV  MSTV  ALTV  MLTV
##   <dbl>   <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1   120 0           0 0       0           0 0          73   0.5    43   2.4
## 2   132 0.00638     0 0.00638 0.00319     0 0          17   2.1     0  10.4
## 3   133 0.00332     0 0.00831 0.00332     0 0          16   2.1     0  13.4
## 4   134 0.00256     0 0.00768 0.00256     0 0          16   2.4     0  23  
## 5   132 0.00651     0 0.00814 0           0 0          16   2.4     0  19.9
## 6   134 0.00105     0 0.0105  0.00944     0 0.00210    26   5.9     0   0  
## # ... with 11 more variables: Width <dbl>, Min <dbl>, Max <dbl>,
## #   Nmax <dbl>, Nzeros <dbl>, Mode <dbl>, Mean <dbl>, Median <dbl>,
## #   Variance <dbl>, Tendency <dbl>, NSP <dbl>
table(ctg$NSP)
## 
##    1    2    3 
## 1655  295  176

1 = normal 2 = suspect 3 = pathology

str(ctg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    2126 obs. of  22 variables:
##  $ LB      : num  120 132 133 134 132 134 134 122 122 122 ...
##  $ AC      : num  0 0.00638 0.00332 0.00256 0.00651 ...
##  $ FM      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ UC      : num  0 0.00638 0.00831 0.00768 0.00814 ...
##  $ DL      : num  0 0.00319 0.00332 0.00256 0 ...
##  $ DS      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ DP      : num  0 0 0 0 0 ...
##  $ ASTV    : num  73 17 16 16 16 26 29 83 84 86 ...
##  $ MSTV    : num  0.5 2.1 2.1 2.4 2.4 5.9 6.3 0.5 0.5 0.3 ...
##  $ ALTV    : num  43 0 0 0 0 0 0 6 5 6 ...
##  $ MLTV    : num  2.4 10.4 13.4 23 19.9 0 0 15.6 13.6 10.6 ...
##  $ Width   : num  64 130 130 117 117 150 150 68 68 68 ...
##  $ Min     : num  62 68 68 53 53 50 50 62 62 62 ...
##  $ Max     : num  126 198 198 170 170 200 200 130 130 130 ...
##  $ Nmax    : num  2 6 5 11 9 5 6 0 0 1 ...
##  $ Nzeros  : num  0 1 1 0 0 3 3 0 0 0 ...
##  $ Mode    : num  120 141 141 137 137 76 71 122 122 122 ...
##  $ Mean    : num  137 136 135 134 136 107 107 122 122 122 ...
##  $ Median  : num  121 140 138 137 138 107 106 123 123 123 ...
##  $ Variance: num  73 12 13 13 11 170 215 3 3 1 ...
##  $ Tendency: num  1 0 0 1 1 0 0 1 1 1 ...
##  $ NSP     : num  2 1 1 1 1 3 3 3 3 3 ...

All variables are in numerical

The target variable, NSP, is supposed to be categorical. So we need to convert this variable as factor variable.

ctg.mydata <- ctg

#convert into factor variable
ctg.mydata$NSP <- as.factor(ctg.mydata$NSP)

str(ctg.mydata)
## Classes 'tbl_df', 'tbl' and 'data.frame':    2126 obs. of  22 variables:
##  $ LB      : num  120 132 133 134 132 134 134 122 122 122 ...
##  $ AC      : num  0 0.00638 0.00332 0.00256 0.00651 ...
##  $ FM      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ UC      : num  0 0.00638 0.00831 0.00768 0.00814 ...
##  $ DL      : num  0 0.00319 0.00332 0.00256 0 ...
##  $ DS      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ DP      : num  0 0 0 0 0 ...
##  $ ASTV    : num  73 17 16 16 16 26 29 83 84 86 ...
##  $ MSTV    : num  0.5 2.1 2.1 2.4 2.4 5.9 6.3 0.5 0.5 0.3 ...
##  $ ALTV    : num  43 0 0 0 0 0 0 6 5 6 ...
##  $ MLTV    : num  2.4 10.4 13.4 23 19.9 0 0 15.6 13.6 10.6 ...
##  $ Width   : num  64 130 130 117 117 150 150 68 68 68 ...
##  $ Min     : num  62 68 68 53 53 50 50 62 62 62 ...
##  $ Max     : num  126 198 198 170 170 200 200 130 130 130 ...
##  $ Nmax    : num  2 6 5 11 9 5 6 0 0 1 ...
##  $ Nzeros  : num  0 1 1 0 0 3 3 0 0 0 ...
##  $ Mode    : num  120 141 141 137 137 76 71 122 122 122 ...
##  $ Mean    : num  137 136 135 134 136 107 107 122 122 122 ...
##  $ Median  : num  121 140 138 137 138 107 106 123 123 123 ...
##  $ Variance: num  73 12 13 13 11 170 215 3 3 1 ...
##  $ Tendency: num  1 0 0 1 1 0 0 1 1 1 ...
##  $ NSP     : Factor w/ 3 levels "1","2","3": 2 1 1 1 1 3 3 3 3 3 ...

One other thing to note here is that since the data has three levels in categorical variable, we have to identify the reference level. So we are going to identify ‘1’ as a reference level. Hence, ‘1’ will mean normal patient

#relevel
ctg.mydata$out <- relevel(ctg.mydata$NSP, ref='1')

Now the data is ready to develop multinomial logistic regression.

#multinomial logistic regression
library(nnet)
mult.model <- multinom(out ~ LB+AC+FM, data = ctg.mydata)
## # weights:  15 (8 variable)
## initial  value 2335.649726 
## iter  10 value 1289.818570
## iter  20 value 1041.240745
## iter  30 value 1036.259306
## final  value 1019.987651 
## converged

The results show that it started with very high error and after 10 iterations it reduced to 1289 from 2335 and so on and finally it converges to a lower value 1019.

summary(mult.model)
## Call:
## multinom(formula = out ~ LB + AC + FM, data = ctg.mydata)
## 
## Coefficients:
##   (Intercept)           LB        AC       FM
## 2 -16.2182977  0.112918884 -829.1624 6.137294
## 3  -0.4208593 -0.006730701 -789.8815 8.231495
## 
## Std. Errors:
##   (Intercept)          LB          AC       FM
## 2    1.261066 0.009050217 0.005518354 1.746013
## 3    1.212057 0.009170115 0.009872315 1.372880
## 
## Residual Deviance: 2039.975 
## AIC: 2055.975

2 means second level, remember 1 is the first level. So 2 is referring to suspect and 3 means pathology. For 2, we have -16.2182977 and the first coefficient for LB is 0.1129 which is positive. For AC, -829.16, it is negative.

Once we have the right coefficeints and once we have conducted a two-tail z-test and we know that which features are significant we can retain them and remove the features that are not statistically significant and then find out y1 and y2 and then use them for calculating the three probabilities

So we can calculate probabilities in r using ‘predict’ command

#equations for calculating probabilities
#predict

pred <- predict(mult.model, ctg.mydata)

head(pred)
## [1] 1 1 1 1 1 1
## Levels: 1 2 3
#prediction of probabilities

prob <- predict(mult.model, ctg.mydata, type = "prob")

head(prob)
##           1           2           3
## 1 0.7341566 0.050942149 0.214901291
## 2 0.9969034 0.001352475 0.001744078
## 3 0.9628285 0.018450596 0.018720923
## 4 0.9297324 0.037501998 0.032765616
## 5 0.9972224 0.001209563 0.001568084
## 6 0.7951989 0.112322028 0.092479056

What we see is for every patient there are three probabilities. For example, for the first patient being normal is .734 so there is a 73.4% chance of patient being normal is the highest so we will predict that this patient is going to be normal.

When we add three probabilities we will obviously get 1.

#prediction of probabilities only for some specific patients

predict(mult.model, ctg.mydata[c(4,1020,876), ], type = "prob")
##           1          2          3
## 1 0.9297324 0.03750200 0.03276562
## 2 0.9608279 0.02175073 0.01742138
## 3 0.9152839 0.04786749 0.03684863

At 1020 patient, there is a 96% chance that the patient being normal.

#misclassification error
library(caret)
## Warning: package 'caret' was built under R version 3.5.3
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.5.3
tab <- table(pred, ctg.mydata$NSP)

tab
##     
## pred    1    2    3
##    1 1592  165  137
##    2   61  128   27
##    3    2    2   12
1-sum(diag(tab))/sum(tab)
## [1] 0.1853246

Compare the prediction of model with actual data to see how often the match are not there. So the misclassification error is 18.5%. So 18.5% of the times the model misclassifies a patient.

#two-tailed z-test

z <- summary(mult.model)$coefficients/summary(mult.model)$standard.errors

p <- (1 - pnorm(abs(z), 0, 1)) * 2

p
##   (Intercept)        LB AC           FM
## 2   0.0000000 0.0000000  0 4.396983e-04
## 3   0.7284205 0.4629596  0 2.025025e-09

We can see the coefficient for LB has a p-value of 0.0000000 coeff for LB is almost zero when the p-value is small obviously the confidence level is high because confidence level is 1-0.0000000(LB), so the confidence level is close to 100%, so this variable is playing a significant role.

In this case, we used class 1 as the reference level which is classified for ‘normal’; 2 is classifed for ‘suspect’; 3 is classified for ‘pathologic’.

LB for third classification, p-value of 0.4629596, 1-.46 = 0.54 so confidence level would be about 54% obviously that’s too small. We look for like 90%, 95% or 99% confidence.

So this means LB doesn’t have significant contribution in the model when 1 is reference and when we are looking at third level for the response.

For AC, both the p-values are small so obviously AC is sigificant.

Also for FM, we have like 4.396983e-04 (4.3 times 10 to the power of negative 4) for class 2; and 2.025025e-09 (2.0 times 10 to the power of negative 9) for class 3, so these coefficients are significant.

In fact in this case we can use all three because they are significant either for both or at least for one.