Title: Supervised Fuzzy Partitioning
Authors: Ashtari et al. 
Year: 2020
Journal: Pattern Recognition
DOI: https://doi.org/10.1016/j.patcog.2019.107013


1. Topic

  • Apply fuzzy c-means to supervised tasks


2. Motivation

  • To capture more complex patterns and yield better performance facing high dimensional-data


3. Problems

  • Few efforts have been maed in order to generalize k-means to be suitable for supervised learning problems


4. Proposed method

  • Objective function
    \[\small \min_{\mathbf{U, V, W, Z}} \quad \sum_{j=1}^{k} \sum_{i=1}^{n} u_{ij} \Vert\mathbf{x}_i-\mathbf{v}_j\Vert^2_{\mathbf{w}_j} + \alpha\sum_{j=1}^{k} \sum_{i=1}^{n} u_{ij}\ell(y_i, \mathbf{z}_j)+\gamma\sum_{j=1}^{k} \sum_{i=1}^{n} u_{ij}\ln{u_{ij}} + \lambda \sum_{j=1}^{k} \sum_{l=1}^{p} w_{ij}\ln{w_{ij}} \] \[ \begin{aligned} \small \text{subject to} \qquad \sum_{j=1}^{k}u_{ij}=1,\qquad i = 1, \cdots,n, \\ \small u_{ij} \geq, 0, \qquad i = 1,\cdots,n, \quad j = 1,\cdots,k, \\ \small \sum_{l=1}^{p}w_{jl}=1,\qquad j = 1, \cdots,k, \\ \small w_{jl} \geq, 0, \qquad j = 1,\cdots,k, \quad l = 1,\cdots,p. \end{aligned} \]
    • weighted Euclidean distance: \(\small \Vert \mathbf{x}_i - \mathbf{x}_{i'} \Vert_{\mathbf{w}} = (\sum_{l=1}^{p} w_l(x_{il}-x_{i'l})^2)^{1/2}\)
    • \(\small \mathbf{U} \text{: Membership matrix (} u_{ij} \text{: membership degree of point } \mathbf{x}_i \text{ in the } j \text{th cluster)}\)
    • \(\small \mathbf{V} \text{: Centroid matrix (} \mathbf{v}_j \text{: center of the } j \text{th cluster)}\)
    • \(\small \mathbf{W} \text{: Feature weight matrix (} w_{jl} \text{: weight of the } l \text{th feature in the } j \text{th cluster)}\)
    • \(\small \mathbf{Z} \text{: Label prototype matrix (} \mathbf{z}_{j} \text{: label prototype of the } j \text{th cluster)}\)
    • Computationally intractable to find a global minimizer for the objective function


  • Block Coordinate Descent solver for SFP
    • To find a local minimizer
    • Four Blocks: Membership(\(\small \mathbf{U}\)), Centers(\(\small \mathbf{V}\)), Label prototypes(\(\small \mathbf{Z}\)), Feature weights(\(\small \mathbf{W}\))
    • In each step, three of the blocks are kept fixed, and the objective function is minimized with respect to the others


  • SFP algorithm
    • Initialization
      • Centers, Label prototypes: randomly choose \(\small k\) observations
      • Weights: \(\small w_{jl} = 1/p\)
    • First step: Update membership matrix \[\small \min_{\mathbf{U}} \quad \sum_{j=1}^{k} \sum_{i=1}^{n} u_{ij}d_{ij} + \gamma \sum_{j=1}^{k} \sum_{i=1}^{n} u_{ij}\ln{u_{ij}}\] \[ \begin{aligned} \small \text{subject to} \qquad \sum_{j=1}^{k}u_{ij}=1,\qquad i = 1, \cdots,n, \\ \small u_{ij} \geq 0, \qquad i = 1,\cdots,n, \quad j = 1,\cdots,k. \\ \end{aligned} \] \[\small d_{ij}=\Vert \mathbf{x}_i - \mathbf{v}_j \Vert ^2_2 + \alpha\ell(y_i, \mathbf{z}_j)\] \[\small \hat u_{ij} = \frac{\exp{(-d_{ij}/\gamma)}}{\sum_{j'=1}^{k} \exp{(-d_{ij'}/\gamma)}}\]
    • Second step: Update centers and label prototype
      • Estimated centers \[\small \min_{\mathbf{V}} \quad F(\mathbf{V}) = \sum_{j=1}^{k} \sum_{i=1}^{n} u_{ij}\Vert \mathbf{x}_i - \mathbf{v}_j \Vert ^2_{\mathbf{w}_j}\] \[\small \hat {\mathbf{v}_j} = \frac{\sum_{i=1}^{n} u_{ij}\mathbf{x}_i}{\sum_{i=1}^{n} u_{ij}}\]
      • Label prototype \[\small \hat {\mathbf{z}_j} = \operatorname*{argmin}_{\mathbf{z} \in \mathscr{Z}} \sum_{i=1}^{n} u_{ij}\ell(y_i, \mathbf{z})\]
    • Third step: Update feature weights \[\small \hat {\mathbf{w}_{jl}} = \frac{\exp{(-s_{jl}/\lambda)}}{\sum_{l'=1}^{p} \exp{(-s_{jl'}/\lambda)}} \quad (s_{jl} = \sum_{i=1}^{n} u_{ij}(x_{il} - v_{jl})^2) \]

    • Test phase
      • Let \(\small \mathbf{x}'\) be a new data point
      • Estimate the membership vector \(\small \mathbf{u}'\) \[\small \min_{\mathbf{u}'} \quad \sum_{j=1}^{k} u'_j\Vert \mathbf{x}' - \mathbf{v}_j \Vert ^2_{\mathbf{w}_j} + \gamma \sum_{j=1}^{k} u'_j\ln{u'_j} \] \[\small \text{subject to} \quad \sum_{j=1}^{k} u'_j = 1, \quad u'_j \geq, 0\] \[\small u'_j = \frac{\exp{(-d_{j'}/\gamma)}}{\sum_{j'=1}^{k} \exp{(-d_{j'}/\gamma)}} (d'_j=\Vert \mathbf{x}' - \mathbf{v}_j \Vert^2_{\mathbf{w}_j})\]
      • Label of \(\small \mathbf{x}'\) \[\small \hat y' = \operatorname*{argmin}_{y \in \mathscr{Y}} \ell(y, \sum_{j=1}^{k} u'_j\mathbf{z}_j) \]

5. Experiments

  • Synthetic datasets
    • Comparison with Random forest (dataset: spiral(\(\small n\) = 312), two circle(\(\small n\) = 1000), XOR(\(\small n\) = 1000))
    • Three-component mixture model(\(\small n\) = 1500) \[\small p(\mathbf{x})=0.25p(\mathbf{x}|y=1)+0.25p(\mathbf{x}|y=2)+0.5p(\mathbf{x}|y=3)\] \[\small p(\mathbf{x}|y=1)=\mathscr{N}(\mathbf{x}|[0 \, 0]^T,\, diag(15, 0.05))\] \[\small p(\mathbf{x}|y=2)=\mathscr{N}(\mathbf{x}|[-12 \, 0]^T, \mathbf{I}) \] \[\small p(\mathbf{x}|y=3)=\frac{2}{3}\mathscr{N}(\mathbf{x}|[0 \, 8]^T, 4\mathbf{I}) + \frac{1}{3}\mathscr{N}(\mathbf{x}|[0 \, -4]^T, \mathbf{I}) \]


  • UCI datasets
    • 5-fold CV, 20 datasets, grid search for determining hyperparameter(\(\small k, \gamma', \alpha', \lambda'\))
    • Accuracy
  • High-dimensional datasets
    • Colon(\(\small n=62, p=2000\)), Leukemia(\(\small n=72, p=7129\)), Lung(\(\small n=181, p=12533\)), Lymphoma(\(\small n=45, p=4026\))
    • Accuracy and AUC
    • The role of \(\small \lambda\) in high-dimensional settings

6. References

  • W. Pedrycz, Algorithms of fuzzy clustering with partial supervision, Pattern Recognit. Lett. 3 (1) (1985) 13–20. (SSFCM)
  • S. Basu, A. Banerjee, R.J. Mooney, Active semi-supervision for pairwise con- strained clustering, in: Proceedings of the 2004 SIAM International Conference on Data Mining, SIAM, 2004, pp. 333–344. (PCKmeans)
  • M. Bilenko, S. Basu , R.J. Mooney, Integrating constraints and metric learning in semi-supervised clustering, in: Proceedings of the Twenty-First International Conference on Machine Learning, ACM, 2004, p. 11. (Constrained clustering by learning distance metrics)
  • U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. 96 (12) (1999) 6745–6750. (Colon dataset)
  • T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science 286 (5439) (1999) 531–537. (Leukemia dataset)
  • G.J. Gordon, R.V. Jensen, L.-L. Hsiao, S.R. Gullans, J.E. Blumenstock, S. Ra- maswamy, W.G. Richards, D.J. Sugarbaker, R. Bueno, Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ra- tios in lung cancer and mesothelioma, Cancer Res. 62 (17) (2002) 4 963–4 967. (Lung dataset)
  • A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, et al., Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling, Nature 403 (6769) (20 0 0) 503. (Lymphoma dataset)