The SSL package features a Gaussian mixture model with expectation maximization classifier for semi-supervised learning. However the implimentation produces an error when more than four features are present. I have attempted to fix a number of the issues by modifying the source from CRAN but have been unsuccessful. I have made this small example to demonstrate the error.
There is a data.frame
of gene expression measurements with known labels that has the following dimensions.
dim(labeled_data)
## [1] 72 10
And the data itself looks like the following.
labeled_data[1:12, 1:8]
## A1BG A1CF A2BP1 A2LD1 A2ML1 A2M A4GALT A4GNT
## 1 164.6427 0.0000 0.7561 112.8242 7.9395 21007.130 225.7089 0.0000
## 2 355.0592 0.0000 2.1149 66.0921 0.2644 4696.467 227.8765 0.0000
## 3 223.8031 0.0000 0.4352 64.1508 1.7410 16436.538 360.3831 0.8705
## 4 42.0627 0.0000 0.0000 103.6612 1.6750 46129.706 877.0041 3.5894
## 5 131.4105 0.0000 0.7686 113.5242 39.2006 25351.222 407.3789 0.0000
## 6 83.4274 0.0000 0.0000 112.1780 8.0606 5507.300 265.0330 1.2897
## 7 199.4404 0.0000 1.6221 127.3804 0.8110 6768.305 779.3998 0.0000
## 8 65.8438 0.0000 2.1135 87.9510 35.5067 38388.071 218.1126 1.6908
## 9 80.7959 0.3651 1.8255 70.0548 1.8255 27741.435 236.2176 0.0000
## 10 85.8338 0.0000 9.1556 102.3654 9.7278 17003.014 477.2362 1.1445
## 11 132.6911 0.0000 0.0000 190.3170 3.1075 7574.798 318.2101 0.6215
## 12 59.1574 0.0000 1.8450 61.5621 8.6101 24218.167 191.2669 7.9951
There is a data.frame
of gene expression measurements with unknown labels that has the following dimensions.
dim(unlabeled_data)
## [1] 216 10
And the data itself looks like the following.
unlabeled_data[1:12, 1:8]
## A1BG A1CF A2BP1 A2LD1 A2ML1 A2M A4GALT A4GNT
## 1 119.1522 0.0000 0.0000 84.6118 1244.7568 14903.0143 167.7822 0.4462
## 2 80.1438 0.0000 1.5134 72.1945 7.1888 15558.0628 246.3110 0.0000
## 3 522.9042 0.0000 0.0000 52.5057 1.1494 836.7318 1118.0077 0.0000
## 4 182.5415 0.0000 0.0000 76.0683 2.2398 12264.8924 590.7402 0.0000
## 5 195.6499 0.0000 0.0000 117.7400 0.4409 3953.4551 49.8207 0.0000
## 6 192.8194 0.0000 0.0000 70.9305 0.9872 15728.4870 266.2279 0.0000
## 7 189.0577 0.0000 0.3542 93.4042 3.5423 22170.2763 227.0634 0.0000
## 8 74.7731 0.0000 0.2532 107.1494 1.5194 13234.3533 200.0569 0.2532
## 9 205.0052 0.0000 0.0000 122.4446 0.0000 16641.0205 256.9871 0.0000
## 10 69.3158 0.0000 3.1822 91.2092 0.0000 10629.1169 167.0644 1.5911
## 11 94.7577 0.0000 0.4361 95.7041 2.6168 5353.4476 119.0632 0.4361
## 12 53.0100 0.2993 0.0000 17.8735 10166.5392 4571.8211 26.9360 0.0000
Finally, there is a vector of labels with the following length.
length(labels_vector)
## [1] 72
And the labels themselves looks like the following.
labels_vector
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [37] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
When run with just four features the sslGmmEM
method from the SSL package will produce the following output.
SSL::sslGmmEM(labeled_data[, 1:4], labels_vector, unlabeled_data[, 1:4])
## $para
## [,1] [,2] [,3] [,4]
## [1,] 155.45001 0.01610553 2.7617877 84.83448
## [2,] 115.63840 0.09724083 13.4125509 34.69570
## [3,] 142.66697 0.55022133 0.7828245 94.88403
## [4,] 68.21061 2.11565583 1.6738634 59.94932
##
## $classProb
## [1] 0.8090278 0.1909722
##
## $yu
## [1] 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1
## [37] 1 1 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 2 1 1 1 1
## [73] 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
## [109] 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [145] 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 2 1
## [181] 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2
##
## $optLambda
## [1] 1
However, when run with just one more feature, for a total of five features, the method produces an error.
try(SSL::sslGmmEM(labeled_data[, 1:5], labels_vector, unlabeled_data[, 1:5]))
## Error in all.label[index] : invalid subscript type 'list'
As learning with four features is not useful this would need to be rectified to use the SSL package in production.
sessionInfo()
## R Under development (unstable) (2019-01-26 r76018)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BiocStyle_2.11.0 BiocManager_1.30.4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 lubridate_1.7.4 lattice_0.20-38
## [4] class_7.3-15 assertthat_0.2.0 digest_0.6.18
## [7] ipred_0.9-8 foreach_1.4.4 mime_0.6
## [10] R6_2.3.0 plyr_1.8.4 NetPreProc_1.1
## [13] SSL_0.1 stats4_3.6.0 evaluate_0.13
## [16] e1071_1.7-0.1 ggplot2_3.1.0 highr_0.7
## [19] pillar_1.3.1 rlang_0.3.1 lazyeval_0.2.1
## [22] caret_6.0-81 rstudioapi_0.9.0 data.table_1.12.0
## [25] miniUI_0.1.1.1 rpart_4.1-13 Matrix_1.2-15
## [28] combinat_0.0-8 rmarkdown_1.11 splines_3.6.0
## [31] gower_0.1.2 stringr_1.4.0 questionr_0.7.0
## [34] munsell_0.5.0 proxy_0.4-22 shiny_1.2.0
## [37] compiler_3.6.0 httpuv_1.4.5.1 xfun_0.4
## [40] pkgconfig_2.0.2 BiocGenerics_0.29.1 htmltools_0.3.6
## [43] nnet_7.3-12 tidyselect_0.2.5 prodlim_2018.04.18
## [46] tibble_2.0.1 bookdown_0.9 codetools_0.2-16
## [49] withr_2.1.2 crayon_1.3.4 dplyr_0.7.8
## [52] later_0.8.0 MASS_7.3-51.1 recipes_0.1.4
## [55] ModelMetrics_1.2.2 grid_3.6.0 nlme_3.1-137
## [58] xtable_1.8-3 gtable_0.2.0 magrittr_1.5
## [61] scales_1.0.0 graph_1.61.0 stringi_1.2.4
## [64] reshape2_1.4.3 promises_1.0.1 bindrcpp_0.2.2
## [67] timeDate_3043.102 generics_0.0.2 xgboost_0.81.0.1
## [70] lava_1.6.5 klaR_0.6-14 iterators_1.0.10
## [73] tools_3.6.0 glue_1.3.0 purrr_0.3.0
## [76] survival_2.43-3 parallel_3.6.0 yaml_2.2.0
## [79] colorspace_1.4-0 knitr_1.21 bindr_0.1.1