1 カテゴリカルデータ(matrix)をプロットする

1.1 例1. genotypingデータ

  • qtlパッケージのgenotypingデータ(F2分離集団のマーカー別アリルタイプ)を用いた
  • x, y軸ともfactorにする
  • データの並びを視覚的に反映するために、y軸の水準を逆にしておくrev()
##   F2_indivisuals D10M44 D1M3 D1M75 D1M215
## 1              1      3    3     3      2
## 2              2     NA    3     3      3
## 3              3     NA    2     2      2
## 4              4      3    3     2      2
## 5              5      2    2     2      2
##   F2_indivisuals marker allele_type
## 1              1 D10M44           3
## 2              2 D10M44        <NA>
## 3              3 D10M44        <NA>
## 4              4 D10M44           3
## 5              5 D10M44           2
## 6              6 D10M44           2

1.1.1 カテゴリカルデータのクラスタリング

2 数値データマトリックスをプロットする。

2.1 連続値データを離散値化してプロットggplot2::geom_tile

  • 数値を変数ごとに0~1に標準化。scale(x, center=min(x), scale=(max(x) - min(x)))
  • 最大値をM、最小値をmに標準化(x-min(x))/(max(x)-min(x)) * (M-m) + m
  • cutを用いて離散値化(factorが返る), findIntervalを用いて離散値化(ベクトルが返る)
  • findInterval(x, vec, all.inside, left.open), 開いているのが右か左かleft.open=T/F、その場合最小もしくは最大を含めるか否かall.inside=T/F
  • geom_tileを使ってtile表示
head(nba)
Name G MIN PTS FGM FGA FGP FTM FTA FTP 3PM 3PA 3PP ORB DRB TRB AST STL BLK TO PF
Dwyane Wade 79 38.6 30.2 10.8 22.0 0.491 7.5 9.8 0.765 1.1 3.5 0.317 1.1 3.9 5.0 7.5 2.2 1.3 3.4 2.3
LeBron James 81 37.7 28.4 9.7 19.9 0.489 7.3 9.4 0.780 1.6 4.7 0.344 1.3 6.3 7.6 7.2 1.7 1.1 3.0 1.7
Kobe Bryant 82 36.2 26.8 9.8 20.9 0.467 5.9 6.9 0.856 1.4 4.1 0.351 1.1 4.1 5.2 4.9 1.5 0.5 2.6 2.3
Dirk Nowitzki 81 37.7 25.9 9.6 20.0 0.479 6.0 6.7 0.890 0.8 2.1 0.359 1.1 7.3 8.4 2.4 0.8 0.8 1.9 2.2
Danny Granger 67 36.2 25.8 8.5 19.1 0.447 6.0 6.9 0.878 2.7 6.7 0.404 0.7 4.4 5.1 2.7 1.0 1.4 2.5 3.1
Kevin Durant 74 39.0 25.3 8.9 18.8 0.476 6.1 7.1 0.863 1.3 3.1 0.422 1.0 5.5 6.5 2.8 1.3 0.7 3.0 1.8

2.2 連続値データのままプロットggplot2::geom_tile

  • ggplot2::scale_fill_gradient(low, high)でグラジエントカラーを自動的に割り振る。
head(nba)
Name k v
Dwyane Wade G 0.9473684
LeBron James G 0.9824561
Kobe Bryant G 1.0000000
Dirk Nowitzki G 0.9824561
Danny Granger G 0.7368421
Kevin Durant G 0.8596491

3 相関マトリックス

correlation matrix(6x6)
mpg cyl disp hp drat wt
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.6999381 0.7824958
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.7102139 0.8879799
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.4487591 0.6587479
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.0000000 -0.7124406
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.7124406 1.0000000
correlation matrix(unpivot)
x keys values
mpg mpg 1.0000000
cyl mpg -0.8521620
disp mpg -0.8475514
hp mpg -0.7761684
drat mpg 0.6811719
wt mpg -0.8676594

4 環境

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.1
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] fields_9.6         maps_3.3.0         spam_2.2-1        
##  [4] dotCall64_1.0-0    rsko_0.1.0         bindrcpp_0.2.2    
##  [7] RColorBrewer_1.1-2 qtl_1.42-8         tidyr_0.8.2       
## [10] dplyr_0.7.8        ggplot2_3.1.0      knitr_1.21        
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6          bit64_0.9-7           webshot_0.5.1        
##  [4] httr_1.4.0            prabclus_2.2-7        Rgraphviz_2.24.0     
##  [7] dynamicTreeCut_1.63-1 tools_3.5.1           R6_2.3.0             
## [10] KernSmooth_2.23-15    DBI_1.0.0             lazyeval_0.2.1       
## [13] BiocGenerics_0.26.0   colorspace_1.4-0      trimcluster_0.1-2.1  
## [16] nnet_7.3-12           withr_2.1.2           tidyselect_0.2.5     
## [19] gridExtra_2.3         bit_1.1-14            compiler_3.5.1       
## [22] rvest_0.3.2           graph_1.58.2          Biobase_2.40.0       
## [25] pacman_0.5.0          xml2_1.2.0            labeling_0.3         
## [28] diptest_0.75-7        caTools_1.17.1.1      KEGGgraph_1.40.0     
## [31] scales_1.0.0          DEoptimR_1.0-8        mvtnorm_1.0-8        
## [34] robustbase_0.93-3     readr_1.3.1           stringr_1.3.1        
## [37] digest_0.6.18         rmarkdown_1.11        XVector_0.20.0       
## [40] pkgconfig_2.0.2       htmltools_0.3.6       highr_0.7            
## [43] rlang_0.3.1           rstudioapi_0.9.0      RSQLite_2.1.1        
## [46] bindr_0.1.1           zoo_1.8-4             mclust_5.4.2         
## [49] gtools_3.8.1          dendextend_1.9.0      magrittr_1.5         
## [52] modeltools_0.2-22     kableExtra_1.0.0      Rcpp_1.0.0           
## [55] munsell_0.5.0         S4Vectors_0.18.3      viridis_0.5.1        
## [58] pathview_1.20.0       stringi_1.2.4         whisker_0.3-2        
## [61] yaml_2.2.0            MASS_7.3-51.1         zlibbioc_1.26.0      
## [64] flexmix_2.3-14        gplots_3.0.1          plyr_1.8.4           
## [67] blob_1.1.1            parallel_3.5.1        gdata_2.18.0         
## [70] ggrepel_0.8.0         crayon_1.3.4          lattice_0.20-38      
## [73] Biostrings_2.48.0     hms_0.4.2             KEGGREST_1.20.2      
## [76] pillar_1.3.1          fpc_2.1-11.1          stats4_3.5.1         
## [79] XML_3.98-1.16         glue_1.3.0            evaluate_0.12        
## [82] png_0.1-7             gtable_0.2.0          purrr_0.2.5          
## [85] kernlab_0.9-27        amap_0.8-16           assertthat_0.2.0     
## [88] xfun_0.4              class_7.3-15          viridisLite_0.3.0    
## [91] tibble_2.0.1          AnnotationDbi_1.42.1  memoise_1.1.0        
## [94] IRanges_2.14.12       cluster_2.0.7-1