20190911

次元縮約＋クラスター分析：東京の自殺を例に

ライブラリーの読み込み

> library(readr)
> library(dplyr)
> library(clustrd)
> library(knitr)

データの読み込みと表示

> mori1 <- read_csv("mori1.csv")
> sui1 <- select(mori1, code, rate, industry1, aging:taxgain)
> sui1.tokyo <- sui1[which(as.numeric(sui1$code) %/% 1000 == 13),]
> kable(sui1.tokyo[1:10,])

code	rate	industry1	aging	unemployment	taxgain
13101	19.59	0.0003645	0.1761120	0.0181717	0.1134598
13102	19.77	0.0003888	0.1607417	0.0240528	0.1948065
13103	17.35	0.0006927	0.1754911	0.0273618	0.5428776
13104	20.16	0.0006748	0.1956889	0.0343030	0.3382430
13105	11.96	0.0006616	0.1909031	0.0272838	0.2612840
13106	18.35	0.0006166	0.2352163	0.0376425	0.1590275
13107	17.11	0.0007521	0.2270851	0.0371070	0.1969722
13108	16.95	0.0006992	0.2108695	0.0351113	0.4161563
13109	10.32	0.0009192	0.2022644	0.0329637	0.3759405
13110	11.20	0.0017310	0.1988243	0.0311272	0.3609426

Tandem(varimax再掲)

> sui1.outTandem.1 <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                           alpha = 0.3, 
+                           rotation = "varimax",
+                           scale=TRUE,nstart = 10, center = TRUE,seed = 1234)


  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%

> summary(sui1.outTandem.1)

Solution with 3 clusters of sizes 39 (62.9%), 14 (22.6%), 9 (14.5%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
           Dim.1   Dim.2
Cluster 1 0.6141  1.5246
Cluster 2 0.4016 -0.3240
Cluster 3 0.4016 -0.3240

Variable scores:
               Dim.1   Dim.2
rate          0.0687  0.0033
industry1    -0.5536 -0.0628
aging        -0.7811  0.0093
unemployment  0.2804 -0.0602
taxgain      -0.0109  0.9962

Within cluster sum of squares by cluster:
[1] 11.4352 10.5876  3.2486
 (between_SS / total_SS =  34.01 %) 

Clustering vector:
 [1] 1 1 2 2 1 1 1 2 2 2 2 2 2 1 2 1 1 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 1 3 1 1

Objective criterion value: 64.996 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"

> plot(sui1.outTandem.1, cludesc = TRUE)

Tandem(none)

> sui1.outTandem.2 <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                             alpha = 0.3, 
+                             rotation = "none",
+                             scale=TRUE,nstart = 10, center = TRUE,seed = 1234)


  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%

> summary(sui1.outTandem.2)

Solution with 3 clusters of sizes 39 (62.9%), 14 (22.6%), 9 (14.5%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
            Dim.1   Dim.2
Cluster 1  0.2222 -0.4657
Cluster 2  1.2128  1.1094
Cluster 3 -2.8493  0.2924

Variable scores:
               Dim.1   Dim.2
rate          0.0634 -0.0267
industry1    -0.5264  0.1826
aging        -0.7003  0.3460
unemployment  0.2268 -0.1754
taxgain       0.4207  0.9030

Within cluster sum of squares by cluster:
[1] 11.4352 10.5876  3.2486
 (between_SS / total_SS =  82.85 %) 

Clustering vector:
 [1] 1 1 2 2 1 1 1 2 2 2 2 2 2 1 2 1 1 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 1 3 1 1

Objective criterion value: 64.996 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"

> plot(sui1.outTandem.2, cludesc = TRUE)