非監督式學習是機器學習中重要的一支,也是目前機器學習中發展核心,其中一議題便是分群(clustering),將大量資料經機器透過一些規則自我形成一群聚。常見的分群演算法,如 k-means, SOM 等為硬分群 (hard partitions, hard clustering),一筆資料只會被分配到單一群聚中,在資料具代表性且資料量多時,硬分群是效果佳的演算法;但在實務上,資料蒐集本身可能就是問題,甚至是需經長時間來收集不同樣態特徵的樣本,此時硬分群的結果就不盡人意。此外,資料中多有不同樣態的雜訊(noise),硬分群在資料前處理不佳下,很容易受到雜訊的影響,也會對分群的結果大打折扣。針對資料蒐集、資料樣態或雜訊處理等議題,軟分群(soft clustering)的概念就被提出,其中最有名的就是模糊 C 平均(Fuzzy C-Means, FCM)演算法,最早由 J.C. Dunn 於 1973 提出,經 James C.Bezdek 進行優化。軟分群不同於硬分群有底下數點
Fuzzy C-means 演算法與 k-means 相似,不同於允許資料不同程度上屬於多個群聚,而 FCM 要解決的問題是最小化加權後平方誤差(minimization of a weighted square error function),即最小化底下的目標函式
\[ J_m = \sum_{i=1}^{N}\sum_{j=1}^{C}u_{ij}^m||x_i-c_j||^2\ ,\ 1 \leq m < \infty \]
其中 \(m \geq 1\),\(u_{ij}\) 代表資料 \(X_i\) 屬於分群 \(j\) 的程度(degree of membership), \(x_i\) 是資料 \(x\) 全部 \(d\) 維度中第 \(i\) 維的資料,\(c_j\) 表示群集的中心(亦有 \(d\) 維度資料),\(||*||\) 表示資料與中心點間的相似度。而模糊分群法是一反覆計算最佳化目標函式的過程,並不斷更新成員程度(\(u_{ij}\))與中心點 \(c_j\),如下
\[ u_{ij} = \frac{1}{\sum_{k=1}^{C}(\frac{||x_i-c_j||}{||x_i-c_k||})^{(\frac{2}{m-1})}}, \\ c_j = \frac{\sum_{i=1}^Nu_{ij}^m*x_i}{\sum_{i=1}^Nu_{ij}^m} \]
而反覆計算停止的條件為當 \(max_{ij}\{\|u_{ij}^{(k+1)}-u_{ij}^{(k)}|\} < \varepsilon\),而 \(\varepsilon\) 為介於 0 至 1 的終止條件,而 \(k\) 為反覆的步驟,這過程會收斂於目標函式 \(J_m\) 的區域最小值(local minimum)或鞍點(saddle point)。
FCM 演算法如下:
FCM 最常用於模式判斷,如時間序列資料等。
可以透過套件 e1071 中的函式 mfuzz 來實作模糊分群演算法。
packageName <- c("e1071")
for(i in 1:length(packageName)) {
if(!(packageName[i] %in% rownames(installed.packages()))) {
install.packages(packageName[i])
}
}
library("e1071")
library("Mfuzz")
## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, cbind, colMeans,
## colnames, colSums, do.call, duplicated, eval, evalq, Filter,
## Find, get, grep, grepl, intersect, is.unsorted, lapply,
## lengths, Map, mapply, match, mget, order, paste, pmax,
## pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
## rowMeans, rownames, rowSums, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
底下透過套件內建的酵母菌細胞週期分子資料來實作軟分群演算法。此酵母菌細胞週期分子資料收集 6178 個基因於 160 分鐘內的 17 個時間點的表現資料,並經過整理後列出 3000 個基因於 17 個時間點的表現資料。
data(yeast)
str(yeast)
## Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
## ..@ experimentData :Formal class 'MIAME' [package "Biobase"] with 13 slots
## .. .. ..@ name : chr ""
## .. .. ..@ lab : chr ""
## .. .. ..@ contact : chr ""
## .. .. ..@ title : chr ""
## .. .. ..@ abstract : chr ""
## .. .. ..@ url : chr ""
## .. .. ..@ pubMedIds : chr ""
## .. .. ..@ samples : list()
## .. .. ..@ hybridizations : list()
## .. .. ..@ normControls : list()
## .. .. ..@ preprocessing : list()
## .. .. ..@ other : list()
## .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
## .. .. .. .. ..@ .Data:List of 1
## .. .. .. .. .. ..$ : int [1:3] 1 0 0
## ..@ assayData :<environment: 0x00000000178b4bd0>
## ..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
## .. .. ..@ varMetadata :'data.frame': 3 obs. of 1 variable:
## .. .. .. ..$ labelDescription: chr [1:3] "NULL" "NULL" "NULL"
## .. .. ..@ data :'data.frame': 17 obs. of 3 variables:
## .. .. .. ..$ index: int [1:17] 0 1 2 3 4 5 6 7 8 9 ...
## .. .. .. ..$ label: Factor w/ 17 levels "cdc28_0","cdc28_10",..: 1 2 10 11 12 13 14 15 16 17 ...
## .. .. .. ..$ time : num [1:17] 0 10 20 30 40 50 60 70 80 90 ...
## .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
## .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
## .. .. .. .. ..@ .Data:List of 1
## .. .. .. .. .. ..$ : int [1:3] 1 1 0
## ..@ featureData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
## .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable:
## .. .. .. ..$ labelDescription: chr(0)
## .. .. ..@ data :'data.frame': 3000 obs. of 0 variables
## .. .. ..@ dimLabels : chr [1:2] "featureNames" "featureColumsn"
## .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
## .. .. .. .. ..@ .Data:List of 1
## .. .. .. .. .. ..$ : int [1:3] 1 1 0
## ..@ annotation : chr [1:6178] "YAL001C" "YAL002W" "YAL003W" "YAL004W" ...
## ..@ protocolData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
## .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable:
## .. .. .. ..$ labelDescription: chr(0)
## .. .. ..@ data :'data.frame': 17 obs. of 0 variables
## .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
## .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
## .. .. .. .. ..@ .Data:List of 1
## .. .. .. .. .. ..$ : int [1:3] 1 1 0
## ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
## .. .. ..@ .Data:List of 4
## .. .. .. ..$ : int [1:3] 2 10 0
## .. .. .. ..$ : int [1:3] 2 5 5
## .. .. .. ..$ : int [1:3] 1 3 0
## .. .. .. ..$ : int [1:3] 1 0 0
此資料是使用 microchip (Affymetrix chips) 產出,包含如分析資料(assayData), 表現資料 (phenoData), 註解資料 (annotation), 與使用的 protocol (protocolData)等內容。
其中 cdc28_0, cdc28_10, … , cdc28_160 代表 17 個時間的標籤。可以透過 yeast$time 來取得時間(單位分鐘)。
# 3000 (genes) x 17 (time points)
head(yeast@assayData[["exprs"]], 10)
## cdc28_0 cdc28_10 cdc28_20 cdc28_30 cdc28_40 cdc28_50 cdc28_60
## YDR132C 0.19 0.30 -0.29 0.29 -0.31 0.23 0.20
## YMR012W -0.15 -0.15 -0.04 -0.28 -0.39 0.03 0.22
## YLR214W 0.38 0.30 -0.68 -0.52 -0.43 -0.13 -0.17
## YLR116W 0.17 0.06 -0.21 0.19 0.33 0.44 0.46
## YDR203W 0.85 -0.10 -0.56 -0.31 -0.43 0.00 -0.34
## YEL059C-A 0.45 0.20 0.06 0.10 -0.21 -0.08 -0.27
## YHR046C 0.37 0.31 0.07 -0.14 -0.69 -0.10 0.21
## YJL053W 0.38 -0.14 -0.14 0.17 -0.21 0.04 -0.06
## YBL021C -0.25 -0.09 0.34 -0.01 0.16 -0.11 -0.22
## YHR034C 0.89 -0.02 -0.22 0.02 0.17 0.37 0.15
## cdc28_70 cdc28_80 cdc28_90 cdc28_100 cdc28_110 cdc28_120
## YDR132C -0.08 0.19 -0.51 0.00 -0.31 0.11
## YMR012W 0.04 -0.15 0.37 0.47 -0.10 -0.09
## YLR214W 0.26 -0.03 -0.34 -0.01 -0.20 0.10
## YLR116W 0.38 -0.15 -0.03 0.04 -0.42 -0.15
## YDR203W 0.17 0.40 -0.37 0.15 0.24 0.24
## YEL059C-A -0.01 -0.29 0.41 -0.08 -0.22 -0.27
## YHR046C 0.05 0.24 -0.16 -0.18 -0.09 0.16
## YJL053W 0.07 0.00 -0.23 0.07 -0.01 0.12
## YBL021C 0.02 -0.22 -0.09 -0.01 0.04 -0.02
## YHR034C 0.17 0.04 -0.15 -0.40 -0.15 -0.02
## cdc28_130 cdc28_140 cdc28_150 cdc28_160
## YDR132C -0.02 0.20 0.36 -0.54
## YMR012W NA -0.04 0.07 0.19
## YLR214W NA 0.45 0.40 0.63
## YLR116W 0.02 NA -0.51 -0.61
## YDR203W 0.17 -0.12 -0.02 0.02
## YEL059C-A NA -0.30 0.25 0.26
## YHR046C -0.10 0.05 NA NA
## YJL053W NA -0.04 -0.14 0.12
## YBL021C NA 0.01 0.27 0.17
## YHR034C 0.04 -0.23 -0.38 -0.27
mfuzz 使用類別 ExpressionSet 產出的物件,若為自己的資料,可以透過套件 Biobase 來產生物件。相關內容可以參考 https://www.bioconductor.org/packages/3.7/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf。
library("Biobase")
ExpressionSet 物件主要包含 3 個屬性,分別為 assayData, phenoData, featureData,其中最小資料集至少需包含 assayData 屬性。
set.seed(123)
# 產生模擬資料
miniDataSet <- matrix(cbind(
rnorm(1000, mean = 0, sd = 2),
rnorm(1000, mean = 100, sd = 10),
rnorm(1000, mean = 20, sd = 5),
rnorm(1000, mean = 10, sd = 10)
), ncol=4)
colnames(miniDataSet) <- c("feat1","feat2","feat3","feat4")
rownames(miniDataSet) <- paste(rep("data_",1000),seq(1,1000,1),sep="")
head(miniDataSet, 5)
## feat1 feat2 feat3 feat4
## data_1 -1.1209513 90.04201 17.44198 8.496925
## data_2 -0.4603550 89.60045 21.18469 6.722429
## data_3 3.1174166 99.82020 17.29205 -4.481653
## data_4 0.1410168 98.67825 26.09614 3.027154
## data_5 0.2585755 74.50657 20.87068 35.984902
# 產生最小資料集
miniDataSet.expset <- ExpressionSet(assayData = miniDataSet)
assayData 為記錄特徵值的表,phenoData 為記錄特徵為何模式或處理(準備資料時須注意 phenoData 的列名需與 assayData 的行名相同),featureData 為紀錄每筆資料的對應資訊或真實資訊等(準備資料時須注意 featureData 的列名需與 assayData 的列名相同)。需要注意 phenoData 與 featureData 需經過函式 AnnotatedDataFrame 將資料進行轉換並記錄 meta data 等屬性。
# 準備 assayData, 參考上述產生最小分析用的資料集
full.assayData <- miniDataSet
head(full.assayData, 5)
## feat1 feat2 feat3 feat4
## data_1 -1.1209513 90.04201 17.44198 8.496925
## data_2 -0.4603550 89.60045 21.18469 6.722429
## data_3 3.1174166 99.82020 17.29205 -4.481653
## data_4 0.1410168 98.67825 26.09614 3.027154
## data_5 0.2585755 74.50657 20.87068 35.984902
# 準備 phenoData
full.phenoData <- as.data.frame(rbind(
c("control","1"),
c("control","2"),
c("test","1"),
c("test","2")
))
colnames(full.phenoData) <- c("treatment","sequence")
rownames(full.phenoData) <- colnames(miniDataSet)
full.phenoData.annot <- AnnotatedDataFrame(full.phenoData)
head(full.phenoData, 5)
## treatment sequence
## feat1 control 1
## feat2 control 2
## feat3 test 1
## feat4 test 2
# 準備 featureData
tureRowNames <- paste(rep("species_",1000),seq(1,1000,1),sep="")
full.featureData <- as.data.frame(tureRowNames, ncol=1)
colnames(full.featureData) <- c("symbol")
rownames(full.featureData) <- rownames(miniDataSet)
full.featureData.annot <- AnnotatedDataFrame(full.featureData)
head(full.featureData, 5)
## symbol
## data_1 species_1
## data_2 species_2
## data_3 species_3
## data_4 species_4
## data_5 species_5
# 產生完整的 ExpressionSet 物件
fullDataSet.expset <- ExpressionSet(assayData = full.assayData, phenoData = full.phenoData.annot, featureData = full.featureData.annot)
當準備好資料結構後便可以接續下方的分析。
# the prototype of filter.NA
# eset: 由類別 ExpressionSet 產生的物件
filter.NA(eset,thres=0.25)
# 移除含有超過 25% NA 的資料
yeast.filter.na <- filter.NA(yeast, thres = 0.25)
## 49 genes excluded.
FCM 與其他許多分群演算法相同,不允許 NA 的存在,需要透過填值方式處理 NA。
# the prototype of fill.NA
# eset: 由類別 ExpressionSet 產生的物件
# mode: 填值計算方法,包含有 mean, median, knn (需填後續的 k 值), knnw (需填後續的 k 值) 等
# |- knnw: 與 knn 相同,但加入與資料的距離作為加權數
# k: 若模式選擇 knn, knnw 時需指派有多少個鄰近值
fill.NA(eset,mode="mean",k=10)
yeast.fill.na <- fill.NA(yeast.filter.na, mode = "mean")
head(yeast.fill.na@assayData[["exprs"]], 10)
## cdc28_0 cdc28_10 cdc28_20 cdc28_30 cdc28_40 cdc28_50 cdc28_60
## YDR132C 0.19 0.30 -0.29 0.29 -0.31 0.23 0.20
## YMR012W -0.15 -0.15 -0.04 -0.28 -0.39 0.03 0.22
## YLR214W 0.38 0.30 -0.68 -0.52 -0.43 -0.13 -0.17
## YLR116W 0.17 0.06 -0.21 0.19 0.33 0.44 0.46
## YDR203W 0.85 -0.10 -0.56 -0.31 -0.43 0.00 -0.34
## YEL059C-A 0.45 0.20 0.06 0.10 -0.21 -0.08 -0.27
## YHR046C 0.37 0.31 0.07 -0.14 -0.69 -0.10 0.21
## YJL053W 0.38 -0.14 -0.14 0.17 -0.21 0.04 -0.06
## YBL021C -0.25 -0.09 0.34 -0.01 0.16 -0.11 -0.22
## YHR034C 0.89 -0.02 -0.22 0.02 0.17 0.37 0.15
## cdc28_70 cdc28_80 cdc28_90 cdc28_100 cdc28_110 cdc28_120
## YDR132C -0.08 0.19 -0.51 0.00 -0.31 0.11
## YMR012W 0.04 -0.15 0.37 0.47 -0.10 -0.09
## YLR214W 0.26 -0.03 -0.34 -0.01 -0.20 0.10
## YLR116W 0.38 -0.15 -0.03 0.04 -0.42 -0.15
## YDR203W 0.17 0.40 -0.37 0.15 0.24 0.24
## YEL059C-A -0.01 -0.29 0.41 -0.08 -0.22 -0.27
## YHR046C 0.05 0.24 -0.16 -0.18 -0.09 0.16
## YJL053W 0.07 0.00 -0.23 0.07 -0.01 0.12
## YBL021C 0.02 -0.22 -0.09 -0.01 0.04 -0.02
## YHR034C 0.17 0.04 -0.15 -0.40 -0.15 -0.02
## cdc28_130 cdc28_140 cdc28_150 cdc28_160
## YDR132C -2.000000e-02 0.200000 3.600000e-01 -5.400000e-01
## YMR012W -3.035766e-18 -0.040000 7.000000e-02 1.900000e-01
## YLR214W 6.250000e-04 0.450000 4.000000e-01 6.300000e-01
## YLR116W 2.000000e-02 0.000625 -5.100000e-01 -6.100000e-01
## YDR203W 1.700000e-01 -0.120000 -2.000000e-02 2.000000e-02
## YEL059C-A 5.421011e-19 -0.300000 2.500000e-01 2.600000e-01
## YHR046C -1.000000e-01 0.050000 1.851727e-18 1.851727e-18
## YJL053W -1.192622e-18 -0.040000 -1.400000e-01 1.200000e-01
## YBL021C -6.250000e-04 0.010000 2.700000e-01 1.700000e-01
## YHR034C 4.000000e-02 -0.230000 -3.800000e-01 -2.700000e-01
許多分群工具皆有提供資料過濾步驟來移除變異低的資料(如小變化的基因表現等),目的為挑出資料變異較大的資料供後續分析。
yeast.filter.trim <- filter.std(yeast.fill.na, min.std = 0)
## 0 genes excluded.
由上圖可以看出 Y 軸為標準差,X 軸為基因,可以看出此 3000 基因皆位於 \(\pm\ 2\ S.D.\) 之間,沒有基因被排除。
因分群的距離計算是在 Euclidian 空間中,資料的特徵值(如基因的表現值)需要經過標準化處理使之相同樣本下的資料平均值為 0 及相對於 1 個標準差的對比值,促使可以讓樣本間來比較。
yeast.std <- standardise(yeast.filter.trim)
head(yeast.std@assayData[["exprs"]], 10)
## cdc28_0 cdc28_10 cdc28_20 cdc28_30 cdc28_40
## YDR132C 0.6507079 1.02860350 -0.9982910 0.99424935 -1.0669993
## YMR012W -0.6842079 -0.68420789 -0.1824554 -1.27718807 -1.7789405
## YLR214W 1.0250819 0.80891968 -1.8390679 -1.40674342 -1.1635609
## YLR116W 0.5310348 0.18615610 -0.6603643 0.59373997 1.0326765
## YDR203W 2.4362918 -0.28473950 -1.6022915 -0.88623064 -1.2299399
## YEL059C-A 1.8186458 0.80828701 0.2424861 0.40414350 -0.8487014
## YHR046C 1.5168521 1.27087610 0.2869720 -0.57394404 -2.8287242
## YJL053W 2.4834745 -0.91496428 -0.9149643 1.11102806 -1.3724464
## YBL021C -1.5153805 -0.54310629 2.0698806 -0.05696919 0.9760721
## YHR034C 2.8693556 -0.06642027 -0.7116457 0.06262483 0.5465439
## cdc28_50 cdc28_60 cdc28_70 cdc28_80 cdc28_90
## YDR132C 0.788124487 0.6850621 -0.27685399 6.507079e-01 -1.75408219
## YMR012W 0.136841579 1.0035049 0.18245544 -6.842079e-01 1.68771281
## YLR214W -0.352952430 -0.4610336 0.70083856 -8.274961e-02 -0.92037835
## YLR116W 1.377555122 1.4402603 1.18943949 -4.722486e-01 -0.09601736
## YDR203W 0.001684849 -0.9721579 0.48860624 1.147382e+00 -1.05808525
## YEL059C-A -0.323314804 -1.0911875 -0.04041435 -1.172016e+00 1.65698837
## YHR046C -0.409960031 0.8609161 0.20498002 9.839041e-01 -0.65593605
## YJL053W 0.261418366 -0.3921275 0.45748214 7.794335e-18 -1.50315561
## YBL021C -0.664640567 -1.3330791 0.12533222 -1.333079e+00 -0.54310629
## YHR034C 1.191769402 0.4820214 0.54654393 1.271474e-01 -0.48581683
## cdc28_100 cdc28_110 cdc28_120 cdc28_130 cdc28_140
## YDR132C -0.002020832 -1.06699931 0.37587476 -0.07072912 0.68506205
## YMR012W 2.143851401 -0.45613860 -0.41052474 0.00000000 -0.18245544
## YLR214W -0.028709049 -0.54209440 0.26851405 0.00000000 1.21422391
## YLR116W 0.123450886 -1.31876899 -0.47224863 0.06074567 0.00000000
## YDR203W 0.431321374 0.68910329 0.68910329 0.48860624 -0.34202437
## YEL059C-A -0.323314804 -0.88911571 -1.09118746 0.00000000 -1.21243051
## YHR046C -0.737928056 -0.36896403 0.65593605 -0.40996003 0.20498002
## YJL053W 0.457482141 -0.06535459 0.78425510 0.00000000 -0.26141837
## YBL021C -0.056969191 0.24686650 -0.11773633 0.00000000 0.06456508
## YHR034C -1.292348667 -0.48581683 -0.06642027 0.12714737 -0.74390702
## cdc28_150 cdc28_160
## YDR132C 1.234728e+00 -1.857145e+00
## YMR012W 3.192970e-01 8.666633e-01
## YLR214W 1.079123e+00 1.700589e+00
## YLR116W -1.600942e+00 -1.914468e+00
## YDR203W -5.560002e-02 5.896972e-02
## YEL059C-A 1.010359e+00 1.050773e+00
## YHR046C -5.767468e-22 -5.767468e-22
## YJL053W -9.149643e-01 7.842551e-01
## YBL021C 1.644511e+00 1.036839e+00
## YHR034C -1.227826e+00 -8.729521e-01
需要注意的是,標準化(Standardisation)與正規化(Normalisation)並不相同,標準化目的使資料間(如基因間)可以比較,正規化目的使樣本間(如時間點間)可以比較。
# the prototype of mfuzz
# eset: 類別 ExpressionSet 的物件
# centers: 分群數目
# m: 模糊化參數
mfuzz(eset, centers, m)
yeast.fcm <- mfuzz(yeast.std, centers = 16, m = 1.25)
summary(yeast.fcm)
## Length Class Mode
## centers 272 -none- numeric
## size 16 -none- numeric
## cluster 2951 -none- numeric
## membership 47216 -none- numeric
## iter 1 -none- numeric
## withinerror 1 -none- numeric
## call 5 -none- call
yeast.fcm$centers
## cdc28_0 cdc28_10 cdc28_20 cdc28_30 cdc28_40 cdc28_50
## 1 -0.8191853 -1.1585300 -1.33535349 -0.17894616 0.51924541 0.72642676
## 2 0.3496005 0.4112077 0.45338229 0.73760093 0.69739062 0.49657820
## 3 -1.0292662 0.4315193 0.65516601 0.04000151 -0.11010745 -0.29183382
## 4 -0.4318914 -0.1183012 1.41365258 0.84785538 0.02116613 -0.28289756
## 5 -1.7051179 -0.6540063 -0.52103457 -0.41646787 0.05008171 -0.18260465
## 6 -1.2280581 -1.0134957 0.02129761 0.63851835 0.83781937 0.63174320
## 7 0.7382132 0.1156997 -0.51019951 -0.75668056 -1.11784917 -0.77539506
## 8 -0.8025271 -0.5454651 0.25905375 -0.20192735 0.02064164 -1.02932210
## 9 1.1279009 -0.4812007 -1.44698429 -0.61142731 -0.45204208 -0.10895502
## 10 0.5492421 0.4632392 -0.10371295 0.13173940 0.07475805 0.30093595
## 11 -1.1861986 0.7523508 0.62368794 0.33792580 0.51740675 0.32070628
## 12 -0.3058531 -1.2363419 -1.55812759 -0.76918882 -0.12748589 -0.02790586
## 13 1.7726572 0.9228351 0.28100744 -0.06793209 -0.66342780 -0.61810470
## 14 -0.3537609 -0.4820002 -1.03725568 -1.02725174 -0.72822222 -0.62115326
## 15 1.4566835 1.1695542 0.83045853 0.74134335 0.15186505 0.02577311
## 16 0.4932322 0.1828821 -0.65993682 -0.15830338 0.14354193 0.42803495
## cdc28_60 cdc28_70 cdc28_80 cdc28_90 cdc28_100 cdc28_110
## 1 0.67092942 0.69051110 0.44985395 -0.5676170 -0.3607560 -0.102892442
## 2 0.46596245 0.43708703 -0.10141515 -0.4636435 -0.6471328 -0.650307709
## 3 0.06273379 -0.36248901 -0.28649224 0.4029379 -0.5905265 -0.201501666
## 4 -0.57702878 -0.85877097 -0.68307735 0.3971722 0.9275028 0.750349421
## 5 0.12640916 0.04206082 0.05842378 0.6922136 0.2255243 0.168427808
## 6 0.41430194 0.18823710 -0.11919221 -0.3007237 -0.2022945 -0.009800655
## 7 -0.75214229 -0.57184599 0.14585875 0.5462155 0.5532856 0.558543567
## 8 -0.38202519 -0.49172483 -0.63942232 2.1256625 0.5161059 -0.001910334
## 9 -0.10784420 0.13896253 0.47990858 -0.1177825 0.2038275 0.234127109
## 10 0.18282315 0.26044749 0.21290416 -1.6943497 -0.9103889 -0.273813026
## 11 0.67827457 0.64755014 0.06624383 -0.1681345 -0.6234257 -0.762582722
## 12 0.10949205 0.21952325 0.48713552 0.4817109 0.4482339 0.369567035
## 13 -0.53751760 -0.52822238 -0.13832783 -0.1099836 -0.1564419 -0.105636748
## 14 -0.43451415 -0.34614258 0.18627389 1.1247483 0.7210737 0.558075666
## 15 -0.10757615 -0.12051221 -0.19722913 -0.5187420 -0.5361025 -0.533981112
## 16 0.58991460 0.80971129 0.41117132 -0.2585145 -0.4056090 -0.516225141
## cdc28_120 cdc28_130 cdc28_140 cdc28_150 cdc28_160
## 1 0.39439959 0.31475222 0.21833736 0.24872993 0.2900945
## 2 -0.33648267 -0.26141088 -0.26233805 -0.67907711 -0.6470019
## 3 -0.31612933 0.25975500 0.13457147 0.94885249 0.2528087
## 4 0.30822046 -0.06876285 -0.29242951 -0.76903368 -0.5837257
## 5 0.06007159 0.38946976 0.31952367 0.78694808 0.5600771
## 6 0.22999071 0.12968223 0.07631706 -0.11154722 -0.1827954
## 7 0.32011226 0.21840537 0.19498066 0.46594254 0.6268554
## 8 -0.57068891 0.25904795 0.25149927 0.65994623 0.5730560
## 9 0.33673422 0.15181915 0.18037946 0.06818679 0.4043899
## 10 0.16722023 0.21871045 0.01662304 0.28038647 0.1232349
## 11 -0.40602279 -0.06350813 -0.01051805 -0.17677039 -0.5469853
## 12 0.37668786 0.29994190 0.29556605 0.33543346 0.6016112
## 13 -0.10586576 -0.14752570 -0.06828424 0.09215782 0.1786128
## 14 0.18455663 0.29586071 0.32898533 0.78649653 0.8442300
## 15 -0.47190497 -0.41415003 -0.39345138 -0.56129285 -0.5207354
## 16 -0.33404594 -0.30143562 -0.01809130 -0.22735896 -0.1789677
head(yeast.fcm$membership, 5)
## 1 2 3 4 5
## YDR132C 0.033028690 0.091047608 0.0272342898 0.0072161749 0.009746862
## YMR012W 0.013064433 0.004648061 0.0317682381 0.0128910868 0.136143859
## YLR214W 0.021262985 0.004720011 0.0134056903 0.0012640257 0.016095409
## YLR116W 0.018496989 0.567353910 0.0042186064 0.0045082888 0.002981605
## YDR203W 0.004296047 0.002029703 0.0009991169 0.0008653977 0.001289807
## 6 7 8 9 10
## YDR132C 0.025435025 0.016152389 0.0024919454 0.048377884 0.384844328
## YMR012W 0.011935139 0.083921713 0.1763958042 0.031640131 0.003557096
## YLR214W 0.003750679 0.181247689 0.0045603759 0.427298861 0.061071586
## YLR116W 0.036037039 0.001337328 0.0013323348 0.007197925 0.017362038
## YDR203W 0.001102459 0.060562272 0.0005371039 0.838000204 0.015208288
## 11 12 13 14 15
## YDR132C 0.0511866317 0.012329002 0.050887226 0.006803118 0.074120930
## YMR012W 0.0086862169 0.104715888 0.009647844 0.349921194 0.002832611
## YLR214W 0.0042315288 0.064867028 0.057617758 0.081616402 0.005741754
## YLR116W 0.0869577573 0.004017206 0.004239055 0.001107516 0.034789363
## YDR203W 0.0006231445 0.016836355 0.030822104 0.008322579 0.003470340
## 16
## YDR132C 0.15909790
## YMR012W 0.01823068
## YLR214W 0.05124822
## YLR116W 0.20806304
## YDR203W 0.01503508
由上可以看出該資料於屬於各分群中的機率,針對單一筆資料(如單一基因)的所有分群的加總值為 1。
yeast.fcm$size
## [1] 184 149 172 198 181 147 175 264 182 189 168 187 182 203 193 177
png(filename="images/mfuzz.png", width = 1024, height = 1024)
mfuzz.plot2(yeast.std, cl=yeast.fcm
, mfrow=c(4,4), time.labels=seq(0,160,10), x11 = FALSE
, cex.lab=1.8, cex.axis=1.8
)
dev.off()
## png
## 2
可以從所屬分群表中找出每筆資料最有可能的分群結果(即最大機率),即可以當作硬分群的結果。
# 列出資料於各群的機率
head(yeast.fcm$membership[1:10,])
## 1 2 3 4 5
## YDR132C 0.033028690 0.091047608 0.0272342898 0.0072161749 0.009746862
## YMR012W 0.013064433 0.004648061 0.0317682381 0.0128910868 0.136143859
## YLR214W 0.021262985 0.004720011 0.0134056903 0.0012640257 0.016095409
## YLR116W 0.018496989 0.567353910 0.0042186064 0.0045082888 0.002981605
## YDR203W 0.004296047 0.002029703 0.0009991169 0.0008653977 0.001289807
## YEL059C-A 0.003321228 0.021635052 0.0588536312 0.0135463763 0.009529987
## 6 7 8 9 10
## YDR132C 0.025435025 0.016152389 0.0024919454 0.048377884 0.384844328
## YMR012W 0.011935139 0.083921713 0.1763958042 0.031640131 0.003557096
## YLR214W 0.003750679 0.181247689 0.0045603759 0.427298861 0.061071586
## YLR116W 0.036037039 0.001337328 0.0013323348 0.007197925 0.017362038
## YDR203W 0.001102459 0.060562272 0.0005371039 0.838000204 0.015208288
## YEL059C-A 0.004975952 0.098345330 0.0699972732 0.027929787 0.015051983
## 11 12 13 14 15
## YDR132C 0.0511866317 0.012329002 0.050887226 0.006803118 0.074120930
## YMR012W 0.0086862169 0.104715888 0.009647844 0.349921194 0.002832611
## YLR214W 0.0042315288 0.064867028 0.057617758 0.081616402 0.005741754
## YLR116W 0.0869577573 0.004017206 0.004239055 0.001107516 0.034789363
## YDR203W 0.0006231445 0.016836355 0.030822104 0.008322579 0.003470340
## YEL059C-A 0.0098121426 0.008431723 0.512016985 0.026613941 0.091321025
## 16
## YDR132C 0.15909790
## YMR012W 0.01823068
## YLR214W 0.05124822
## YLR116W 0.20806304
## YDR203W 0.01503508
## YEL059C-A 0.02861758
# 列出前 10 筆的硬分群
yeast.fcm$cluster[1:10]
## YDR132C YMR012W YLR214W YLR116W YDR203W YEL059C-A YHR046C
## 10 14 9 2 9 13 13
## YJL053W YBL021C YHR034C
## 9 3 16
png(filename="images/mfuzz_1.png", width = 500, height = 500)
mfuzz.plot2(yeast.std, cl=yeast.fcm
, mfrow=c(1,1), time.labels=seq(0,160,10), x11 = FALSE
, cex.lab=1.8, cex.axis=1.8, single = 1
)
## NULL
dev.off()
## png
## 2