はじめに

九州心理学会にエントリーした時の，抄録原稿には次のように書きました。「旅行経験が豊富であるほど，旅行ついて様々な楽しみを見出すことができ，自分自身の旅行スタイルを作り上げていくことができるだろう。こうした旅行者の心理を考えるときは，デモグラフィック要因や社会的な要因から注意深く対象をクラスタリングした上で，それぞれのクラスターがもつ特徴に沿った解釈をする必要がある。本研究では旅行者のクラスターの特徴と，それに伴う旅行イメージの違いについて報告する。」

ということなので，この方針にそって分析してみます。

デモグラフィック要因のクラスタリングから

名義尺度の有効性チェック

まず，でもグラ変数として年代(group)，性別(sc1),既婚未婚(sc4),子供の有無(sc5),職業(sc6),最終学歴(sc7),世帯収入(sc8),暮し向き(sc9),住まい(sc10),居住年数(sc11)をデータセットとしてつくりました。

そこから経県値を予測してみます。経県値は今回から，「住んだことがある」をカウントせずに，行ったことがあるかどうかの４段階を足し算したスコアとしました。予測変数をどうするか，というのを考えるために，まずは分岐樹で分析します（非線形性の可能性を考えて)。

library(mvpart)
result.tree <- rpart(KK.total~group+sc1+sc4+sc5+sc6+sc7+sc8+sc9+sc10+sc11,data=travel)
plot(result.tree)
text(result.tree)

plot of chunk unnamed-chunk-2

経県値総得点を決めるのは，group,sc6,sc8が強そうです。

result.tree <- rpart(KK.total~group+sc6+sc8,data=travel)
plot(result.tree)
text(result.tree)

plot of chunk unnamed-chunk-3

経県値の分布が性気分布ではなく少し左に歪んでいるので，ポアソン分布を仮定した回帰分析をしてみます。

hist(travel$KK.total)

plot of chunk unnamed-chunk-4

result.glm <- glm(KK.total+1~group+sc6+sc8,data=travel,family=poisson)
summary(result.glm)

## 
## Call:
## glm(formula = KK.total + 1 ~ group + sc6 + sc8, family = poisson, 
##     data = travel)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -10.154   -2.033   -0.416    1.627    8.847  
## 
## Coefficients:
##                                         Estimate Std. Error z value
## (Intercept)                               3.6028     0.0347  103.84
## groupM30                                  0.1300     0.0333    3.90
## groupM40                                  0.2591     0.0325    7.97
## groupM50                                  0.3913     0.0319   12.25
## groupM60                                  0.6104     0.0330   18.50
## groupF20                                 -0.0968     0.0343   -2.82
## groupF30                                  0.1945     0.0339    5.73
## groupF40                                  0.2418     0.0336    7.19
## groupF50                                  0.3928     0.0329   11.95
## groupF60                                  0.5942     0.0335   17.75
## sc6会社勤務（事務職）                     0.2392     0.0291    8.21
## sc6会社勤務（販売・労務職）               0.0999     0.0570    1.75
## sc6会社勤務（管理職）                     0.2993     0.0284   10.55
## sc6会社経営（経営者・役員）               0.2156     0.0386    5.59
## sc6公務員・教職員・非営利団体職員         0.2420     0.0313    7.74
## sc6派遣社員・契約社員                     0.0963     0.0307    3.14
## sc6自営業（商工サービス）                 0.1311     0.0296    4.43
## sc6ＳＯＨＯ                              -0.0261     0.0712   -0.37
## sc6農林漁業                              -0.1269     0.1313   -0.97
## sc6専門職（弁護士・税理士等・医療関連）   0.0820     0.0352    2.33
## sc6パート・アルバイト                    -0.0733     0.0244   -3.00
## sc6専業主婦                              -0.0348     0.0245   -1.42
## sc6学生                                  -0.0723     0.0458   -1.58
## sc6無職（年金生活者を含む）              -0.1544     0.0283   -5.45
## sc6その他の職業                          -0.0342     0.0470   -0.73
## sc8200-400                                0.0245     0.0242    1.01
## sc8400-600                                0.0576     0.0244    2.37
## sc8600-800                                0.1984     0.0259    7.65
## sc8800-1000                               0.3245     0.0267   12.14
## sc81000-1500                              0.1957     0.0323    6.05
## sc81500-2000                              0.0818     0.0713    1.15
## sc820000-                                 0.5442     0.0493   11.04
## sc8DK/NA                                  0.1135     0.0253    4.49
##                                         Pr(>|z|)    
## (Intercept)                              < 2e-16 ***
## groupM30                                 9.4e-05 ***
## groupM40                                 1.6e-15 ***
## groupM50                                 < 2e-16 ***
## groupM60                                 < 2e-16 ***
## groupF20                                  0.0048 ** 
## groupF30                                 9.8e-09 ***
## groupF40                                 6.6e-13 ***
## groupF50                                 < 2e-16 ***
## groupF60                                 < 2e-16 ***
## sc6会社勤務（事務職）                    2.3e-16 ***
## sc6会社勤務（販売・労務職）               0.0797 .  
## sc6会社勤務（管理職）                    < 2e-16 ***
## sc6会社経営（経営者・役員）              2.3e-08 ***
## sc6公務員・教職員・非営利団体職員        1.0e-14 ***
## sc6派遣社員・契約社員                     0.0017 ** 
## sc6自営業（商工サービス）                9.3e-06 ***
## sc6ＳＯＨＯ                               0.7134    
## sc6農林漁業                               0.3340    
## sc6専門職（弁護士・税理士等・医療関連）   0.0197 *  
## sc6パート・アルバイト                     0.0027 ** 
## sc6専業主婦                               0.1555    
## sc6学生                                   0.1141    
## sc6無職（年金生活者を含む）              4.9e-08 ***
## sc6その他の職業                           0.4668    
## sc8200-400                                0.3117    
## sc8400-600                                0.0180 *  
## sc8600-800                               2.0e-14 ***
## sc8800-1000                              < 2e-16 ***
## sc81000-1500                             1.4e-09 ***
## sc81500-2000                              0.2516    
## sc820000-                                < 2e-16 ***
## sc8DK/NA                                 7.0e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 6791.3  on 499  degrees of freedom
## Residual deviance: 4433.0  on 467  degrees of freedom
## AIC: 7371
## 
## Number of Fisher Scoring iterations: 5

この三つはバリバリ効いているみたいなので，この三つだけ取り出して分類していくことにします。

数値化してからクラスタリング

まずは双対尺度法で数値化します。

demogra <- subset(travel,select=c("group","sc6","sc8"))
source("http://aoki2.si.gunma-u.ac.jp/R/src/make.dummy.R", encoding="euc-jp")
demogra.c <- make.dummy(demogra)
library(ca)
result.ca <- ca(demogra.c)
plot(result.ca)

plot of chunk unnamed-chunk-5 ここの座標値＝与えられた数値と，経県値を引っ付けて数字のデータセットとして考えます。

数値化できたので，クラスター分析して分類することを考えます。変数同士の相関が考えられるので，マハラノビス距離にして階層的クラスタリングをします。

dist_set <- cbind(result.ca$rowcoord,travel$KK.total)
set.seed(20141112)
dat <- scale(dist_set)
D2 <- mahalanobis(dat,colMeans(dat),cov(dat))
result.hcl <- hclust(dist(D2),method="ward.D2")
plot(result.hcl)

plot of chunk unnamed-chunk-6

hcl.class <- cutree(result.hcl,5)

5クラスタぐらいでしょうか。非階層的クラスタリングでもやってみます。

result.km <- kmeans(D2,5)
km.class <- result.km$cluster
xtabs(~km.class+hcl.class)

##         hcl.class
## km.class   1   2   3   4   5
##        1   2   0   0   0   0
##        2   0   0   0 255  30
##        3   0   0   0   0 166
##        4   0   0  32   0   0
##        5   0  15   0   0   0

result.km2 <- kmeans(D2,5)
xtabs(~result.km2$cluster+result.km$cluster)

##                   result.km$cluster
## result.km2$cluster   1   2   3   4   5
##                  1   0 285   0   0   0
##                  2   2   0   0   0   0
##                  3   0   0   0   0  15
##                  4   0   0   0  32   0
##                  5   0   0 166   0   0

階層的クラスタリングの分類と，非階層的クラスタリングの分類がほぼ一致しました。今後は非階層的クラスタリングの結果を使います。

travel$class <- as.factor(km.class)
travel <- travel[order(travel$class),]

クラスタリングはこれでおわり。あとあとややこしくなるので，クラスタの順に並べ直しておきます。

クラスタリングの特徴をみるなど

元データとの関係

クラスタリングの結果が元のデータとどういう関係にあるかみます。まずは経県値。クラスタリングを独立変数，経県値を従属変数にした分散分析をします。注)クラスタ1は二人しかいませんから，以後の分析では除外します。

source("anovakun_462.txt")
dat <- cbind(travel$class,travel$KK.total)
dat <- subset(dat,dat[,1]!=1)
dat[,1] <- dat[,1]-1
anovakun(data.frame(dat),"As",4,peta=T)

## 
## [ As-Type Design ] 
##  
## This output was generated by anovakun 4.6.2 under R version 3.1.1. 
## It was executed on Thu Nov 13 11:13:36 2014. 
##  
##  
## << DESCRIPTIVE STATISTICS >>
## 
## ----------------------------
##   A   n     Mean     S.D. 
## ----------------------------
##  a1 285  50.2772  23.5564 
##  a2 166  62.8916  30.3405 
##  a3  32  71.4062  26.4671 
##  a4  15  61.6667  34.7104 
## ----------------------------
## 
## 
## << ANOVA TABLE >>
## 
## == This data is UNBALANCED!! ==
## == Type III SS is applied. ==
## 
## ------------------------------------------------------------------
## Source           SS  df         MS  F-ratio  p-value      p.eta^2 
## ------------------------------------------------------------------
##      A   25278.1695   3  8426.0565  11.9589   0.0000 ***   0.0677 
##  Error  348066.2020 494   704.5875        
## ------------------------------------------------------------------
##  Total  373344.3715 497 
##                        +p < .10, *p < .05, **p < .01, ***p < .001 
## 
##  
## << POST ANALYSES >>
## 
## < MULTIPLE COMPARISON for "A" >
## 
## == Shaffer's Modified Sequentially Rejective Bonferroni Procedure ==
## == The factor < A > is analysed as independent means. == 
## == Alpha level is 0.05. == 
##  
## ----------------------------
##   A   n     Mean     S.D. 
## ----------------------------
##  a1 285  50.2772  23.5564 
##  a2 166  62.8916  30.3405 
##  a3  32  71.4062  26.4671 
##  a4  15  61.6667  34.7104 
## ----------------------------
## 
## --------------------------------------------------------------
##    Pair       Diff  t-value  df        p    adj.p             
## --------------------------------------------------------------
##   a1-a2   -12.6144   4.8673 494   0.0000   0.0000   a1 < a2 * 
##   a1-a3   -21.1291   4.2695 494   0.0000   0.0001   a1 < a3 * 
##   a2-a3    -8.5147   1.6615 494   0.0972   0.2917   a2 = a3   
##   a1-a4   -11.3895   1.6197 494   0.1059   0.3178   a1 = a4   
##   a3-a4     9.7396   1.1726 494   0.2415   0.4831   a3 = a4   
##   a2-a4     1.2249   0.1712 494   0.8642   0.8642   a2 = a4   
## --------------------------------------------------------------
## 
## 
## output is over --------------------///

boxplot(KK.total~class,data=travel)

plot of chunk unnamed-chunk-9

バリバリ有意差があります。ポアソン分布を仮定した回帰分析でももちろんオーケー。

result.glm <- glm(KK.total+1~class,data=travel,family=poisson)
summary(result.glm)

## 
## Call:
## glm(formula = KK.total + 1 ~ class, family = poisson, data = travel)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -10.055   -2.780   -0.238    2.223    8.162  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.93183    0.09901   39.71  < 2e-16 ***
## class2       0.00542    0.09936    0.05  0.95649    
## class3       0.22536    0.09949    2.27  0.02350 *  
## class4       0.35047    0.10117    3.46  0.00053 ***
## class5       0.20600    0.10425    1.98  0.04814 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 6791.3  on 499  degrees of freedom
## Residual deviance: 6356.8  on 495  degrees of freedom
## AIC: 9239
## 
## Number of Fisher Scoring iterations: 4

あとは他のデモグラフィック要因との関係。クロス集計で見ていきます。

xtabs(~group+class,data=travel)

##      class
## group  1  2  3  4  5
##   M20  0 22 26  1  1
##   M30  0 25 17  6  2
##   M40  1 18 24  4  3
##   M50  0 17 25  7  1
##   M60  1 24 16  7  2
##   F20  0 34 12  1  3
##   F30  0 41  7  1  1
##   F40  0 31 15  2  2
##   F50  0 36 13  1  0
##   F60  0 37 11  2  0

xtabs(~sc1+class,data=travel)

##         class
## sc1        1   2   3   4   5
##   male     2 106 108  25   9
##   female   0 179  58   7   6

xtabs(~sc6+class,data=travel)

##                                       class
## sc6                                     1  2  3  4  5
##   会社勤務（一般社員）                  0 81  7  0  2
##   会社勤務（事務職）                    0  2 24  0  1
##   会社勤務（販売・労務職）              0  0  0  7  0
##   会社勤務（管理職）                    0  1 23  0  0
##   会社経営（経営者・役員）              0  0  0  9  2
##   公務員・教職員・非営利団体職員        0  0 17  1  2
##   派遣社員・契約社員                    0  1 23  1  1
##   自営業（商工サービス）                0  2 23  0  1
##   ＳＯＨＯ                              0  0  0  0  4
##   農林漁業                              1  0  0  0  0
##   専門職（弁護士・税理士等・医療関連）  0  0 12  4  0
##   パート・アルバイト                    0 67  6  0  0
##   専業主婦                              0 90  4  2  0
##   学生                                  0  0 18  0  1
##   無職（年金生活者を含む）              0 41  9  0  1
##   その他の職業                          1  0  0  8  0

xtabs(~sc8+class,data=travel)

##            class
## sc8          1  2  3  4  5
##   200        0 42 18  1  1
##   200-400    2 72 26  7  2
##   400-600    0 66 33  6  1
##   600-800    0 37 27  1  1
##   800-1000   0 22 22  5  0
##   1000-1500  0  0 19  7  1
##   1500-2000  0  0  0  0  4
##   20000-     0  0  0  2  4
##   DK/NA      0 46 21  3  1

尺度との関係

push要因＝旅行動機の尺度(Q1)

まずは因子分析してみます。

motive <- travel[16:45]
library(psych)
fa.parallel(motive)

## Loading required package: parallel
## Loading required package: MASS

plot of chunk unnamed-chunk-12

## Parallel analysis suggests that the number of factors =  7  and the number of components =  4

result.fa <- fa(motive,fm="ml",nfactors=4,rotate="oblimin",scores=T)

## Loading required package: GPArotation

print(result.fa,sort=T,cut=0.3)

## Factor Analysis using method =  ml
## Call: fa(r = motive, nfactors = 4, rotate = "oblimin", scores = T, 
##     fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##       item   ML2   ML1   ML3   ML4   h2   u2 com
## q1_16   16  0.86                   0.68 0.32 1.0
## q1_15   15  0.85                   0.64 0.36 1.0
## q1_17   17  0.82                   0.63 0.37 1.0
## q1_4     4  0.70                   0.54 0.46 1.1
## q1_5     5  0.59                   0.46 0.54 1.1
## q1_18   18  0.59                   0.39 0.61 1.5
## q1_3     3  0.57                   0.51 0.49 1.3
## q1_26   26  0.46                   0.35 0.65 1.6
## q1_1     1  0.44        0.31       0.58 0.42 2.2
## q1_2     2  0.41  0.33             0.43 0.57 2.0
## q1_6     6  0.41                   0.56 0.44 2.5
## q1_21   21  0.41        0.38       0.49 0.51 2.0
## q1_27   27  0.37  0.31             0.52 0.48 2.7
## q1_11   11        0.84             0.73 0.27 1.0
## q1_13   13        0.84             0.67 0.33 1.0
## q1_12   12        0.82             0.71 0.29 1.0
## q1_14   14        0.71             0.64 0.36 1.1
## q1_28   28  0.36  0.45             0.51 0.49 2.1
## q1_29   29  0.32  0.41             0.56 0.44 2.4
## q1_30   30  0.36  0.40             0.54 0.46 2.3
## q1_7     7              0.89       0.66 0.34 1.1
## q1_9     9              0.77       0.65 0.35 1.0
## q1_8     8              0.75       0.53 0.47 1.0
## q1_10   10              0.61       0.54 0.46 1.3
## q1_22   22              0.59       0.54 0.46 1.2
## q1_20   20              0.45       0.43 0.57 1.6
## q1_19   19              0.43       0.46 0.54 1.8
## q1_23   23                    0.76 0.62 0.38 1.0
## q1_24   24                    0.75 0.65 0.35 1.0
## q1_25   25  0.44             -0.51 0.43 0.57 2.3
## 
##                        ML2  ML1  ML3  ML4
## SS loadings           5.97 4.45 4.28 1.92
## Proportion Var        0.20 0.15 0.14 0.06
## Cumulative Var        0.20 0.35 0.49 0.55
## Proportion Explained  0.36 0.27 0.26 0.12
## Cumulative Proportion 0.36 0.63 0.88 1.00
## 
##  With factor correlations of 
##      ML2  ML1  ML3  ML4
## ML2 1.00 0.28 0.43 0.16
## ML1 0.28 1.00 0.56 0.44
## ML3 0.43 0.56 1.00 0.27
## ML4 0.16 0.44 0.27 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  435  and the objective function was  19.47 with Chi Square of  9504
## The degrees of freedom for the model are 321  and the objective function was  2.76 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.05 
## 
## The harmonic number of observations is  500 with the empirical chi square  683.7  with prob <  2.4e-28 
## The total number of observations was  500  with MLE Chi Square =  1338  with prob <  4.5e-124 
## 
## Tucker Lewis Index of factoring reliability =  0.847
## RMSEA index =  0.081  and the 90 % confidence intervals are  0.075 0.084
## BIC =  -656.9
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                 ML2  ML1  ML3  ML4
## Correlation of scores with factors             0.96 0.96 0.95 0.90
## Multiple R square of scores with factors       0.93 0.92 0.90 0.81
## Minimum correlation of possible factor scores  0.85 0.84 0.80 0.62

4因子として，因子得点を従属変数，クラスターを独立変数とした分散分析をします。第四因子だけ有意差がみられました。注）クラスタ1は2名だけなので除外したので，a1,a2,a3,a4がそれぞれCL2,CL3,CL4,CL5に対応していることに注意

dat <- cbind(travel$class,result.fa$scores[,4])
# class 1 has member 2 ===> omit
dat <- subset(dat,dat[,1]!=1)
dat[,1] <- dat[,1]-1
anovakun(data.frame(dat),"As",4,peta=T)

## 
## [ As-Type Design ] 
##  
## This output was generated by anovakun 4.6.2 under R version 3.1.1. 
## It was executed on Thu Nov 13 11:13:37 2014. 
##  
##  
## << DESCRIPTIVE STATISTICS >>
## 
## ---------------------------
##   A   n     Mean    S.D. 
## ---------------------------
##  a1 285  -0.0904  0.8790 
##  a2 166   0.0616  0.9625 
##  a3  32   0.2764  0.6773 
##  a4  15   0.5082  0.6976 
## ---------------------------
## 
## 
## << ANOVA TABLE >>
## 
## == This data is UNBALANCED!! ==
## == Type III SS is applied. ==
## 
## ----------------------------------------------------------
## Source        SS  df      MS F-ratio  p-value     p.eta^2 
## ----------------------------------------------------------
##      A    9.2794   3  3.0931  3.8849   0.0092 **   0.0230 
##  Error  393.3181 494  0.7962       
## ----------------------------------------------------------
##  Total  402.5976 497 
##                +p < .10, *p < .05, **p < .01, ***p < .001 
## 
##  
## << POST ANALYSES >>
## 
## < MULTIPLE COMPARISON for "A" >
## 
## == Shaffer's Modified Sequentially Rejective Bonferroni Procedure ==
## == The factor < A > is analysed as independent means. == 
## == Alpha level is 0.05. == 
##  
## ---------------------------
##   A   n     Mean    S.D. 
## ---------------------------
##  a1 285  -0.0904  0.8790 
##  a2 166   0.0616  0.9625 
##  a3  32   0.2764  0.6773 
##  a4  15   0.5082  0.6976 
## ---------------------------
## 
## -------------------------------------------------------------
##    Pair      Diff  t-value  df        p    adj.p             
## -------------------------------------------------------------
##   a1-a4   -0.5987   2.5328 494   0.0116   0.0698   a1 = a4   
##   a1-a3   -0.3668   2.2052 494   0.0279   0.0837   a1 = a3   
##   a2-a4   -0.4466   1.8564 494   0.0640   0.1920   a2 = a4   
##   a1-a2   -0.1521   1.7455 494   0.0815   0.2445   a1 = a2   
##   a2-a3   -0.2148   1.2467 494   0.2131   0.4262   a2 = a3   
##   a3-a4   -0.2318   0.8303 494   0.4068   0.4262   a3 = a4   
## -------------------------------------------------------------
## 
## 
## output is over --------------------///

pull要因＝旅行先に期待するもの(Q2)

旅行に求めるものを因子分析し，分散分析する，という一連の流れをやってみました。

expect <- travel[46:56]
fa.parallel(expect)

plot of chunk unnamed-chunk-14

## Parallel analysis suggests that the number of factors =  4  and the number of components =  2

result.fa2 <- fa(expect,fm="ml",nfactors=4,rotate="oblimin",scores=T)
print(result.fa2,sort=T,cut=0.3)

## Factor Analysis using method =  ml
## Call: fa(r = expect, nfactors = 4, rotate = "oblimin", scores = T, 
##     fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##       item   ML2   ML1   ML3   ML4   h2   u2 com
## q2_11   11  0.77                   0.58 0.42 1.0
## q2_10   10  0.68                   0.50 0.50 1.0
## q2_3     3  0.68                   0.56 0.44 1.2
## q2_4     4  0.66                   0.51 0.49 1.3
## q2_2     2        0.80             0.71 0.29 1.0
## q2_5     5        0.62             0.52 0.48 1.4
## q2_6     6        0.46             0.41 0.59 2.0
## q2_7     7              0.84       0.79 0.21 1.0
## q2_1     1              0.52       0.36 0.64 1.6
## q2_8     8                    0.58 0.49 0.51 1.2
## q2_9     9                    0.58 0.70 0.30 1.6
## 
##                        ML2  ML1  ML3  ML4
## SS loadings           2.13 1.63 1.34 1.02
## Proportion Var        0.19 0.15 0.12 0.09
## Cumulative Var        0.19 0.34 0.46 0.56
## Proportion Explained  0.35 0.27 0.22 0.17
## Cumulative Proportion 0.35 0.61 0.83 1.00
## 
##  With factor correlations of 
##      ML2  ML1  ML3  ML4
## ML2 1.00 0.24 0.21 0.24
## ML1 0.24 1.00 0.43 0.60
## ML3 0.21 0.43 1.00 0.34
## ML4 0.24 0.60 0.34 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 4 factors are sufficient.
## 
## The degrees of freedom for the null model are  55  and the objective function was  3.94 with Chi Square of  1949
## The degrees of freedom for the model are 17  and the objective function was  0.11 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.04 
## 
## The harmonic number of observations is  500 with the empirical chi square  28.02  with prob <  0.045 
## The total number of observations was  500  with MLE Chi Square =  56.4  with prob <  4.1e-06 
## 
## Tucker Lewis Index of factoring reliability =  0.932
## RMSEA index =  0.069  and the 90 % confidence intervals are  0.049 0.088
## BIC =  -49.24
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                 ML2  ML1  ML3  ML4
## Correlation of scores with factors             0.91 0.90 0.90 0.85
## Multiple R square of scores with factors       0.82 0.81 0.81 0.72
## Minimum correlation of possible factor scores  0.64 0.63 0.63 0.44

これはどこにも有意差がありませんでした。

イメージの違い

以下学会発表

クラスタごとに平均的な距離行列を作り，個人差MDSでプロットしてみます。GD2014の発表と同じやり方です。クラスタの経県値平均が50,60,70のCL2,CL3,CL4をターゲットにしてみましょう。

三次元ぐらいあったほうが適合は良いようです。

library(smacof)

## Loading required package: rgl

ind.dist <- list(psy.dist.class2,psy.dist.class3,psy.dist.class4)
result.ind <- smacofIndDiff(ind.dist,ndim=3,constraint="indscal",type="ordinal",itmax=10000)
result.ind

## 
## Call: smacofIndDiff(delta = ind.dist, ndim = 3, type = "ordinal", constraint = "indscal", 
##     itmax = 10000)
## 
## Model: Three-way SMACOF 
## Number of objects: 10 
## Stress-1 value: 0.1091 
## Number of iterations: 4639

次元の歪みを見てみます。

result.ind$cweights

## [[1]]
##       D1     D2    D3
## D1 1.365 0.0000 0.000
## D2 0.000 0.7981 0.000
## D3 0.000 0.0000 0.804
## 
## [[2]]
##       D1     D2    D3
## D1 1.204 0.0000 0.000
## D2 0.000 0.8008 0.000
## D3 0.000 0.0000 1.062
## 
## [[3]]
##       D1     D2    D3
## D1 1.106 0.0000 0.000
## D2 0.000 0.8617 0.000
## D3 0.000 0.0000 1.129

ちょっとした結論

CL2(経験値50台)は第一次元を伸ばし，第二，第三次元を縮めます。
CL3(経験値60台)は第一次元を伸ばし，第二を縮めますが，第三次元はそのままです。。
CL4(経験値70台)は第一次元を伸ばし，第二を縮め，第三次元を伸ばします。

経験値との関係で行くと，

第一次元は経県値が増えると縮んでいく
第二次元は経県値が増えると伸びていく
第三次元は経県値が増えると伸びていく

ということになります。次元ごとのプロットは次の通りです。

plot(result.ind$gspace[,1:2],xlab="Dim1",ylab="Dim2",type="n",main="Dim 1 and 2")
text(result.ind$gspace[,1:2],rownames(psy.dist.class1))

plot of chunk unnamed-chunk-18

plot(result.ind$gspace[,1],result.ind$gspace[,3],xlab="Dim1",ylab="Dim3",type="n",main="Dim 1 and 3")
text(result.ind$gspace[,1],result.ind$gspace[,3],rownames(psy.dist.class1))

plot of chunk unnamed-chunk-18

plot(result.ind$gspace[,2:3],xlab="Dim2",ylab="Dim3",type="n",main="Dim 2 and 3")
text(result.ind$gspace[,2:3],rownames(psy.dist.class1))

plot of chunk unnamed-chunk-18

まとめて表示。

library(scatterplot3d)
obj <- scatterplot3d(result.ind$gspace,xlim=c(-1,1),ylim=c(-1,1),zlim=c(-1,1))
text(obj$xyz.convert(result.ind$gspace), labels=rownames(psy.dist.class1), pos=4)

plot of chunk unnamed-chunk-19

等高線モデルもね。

第二，第三次元が経県値とともに伸びていく＝強調されていく次元ですから，この二つの次元についてのマッピングに「行きたいかどうか」の評定値のクラス平均を等高線モデルで上書きしてみます。今回はIDW法を使いました。

IDW.cl2 <- IDW.map(map.cl2,loc)
image(x=IDW.cl2$x, y=IDW.cl2$y,z=IDW.cl2$valence,axes=F,xlab="",ylab="",col=heat.colors(20),breaks=seq(4,6,0.1),main="cluster 2(Keiken Lv50)")
contour(x=IDW.cl2$x,y=IDW.cl2$y,z=IDW.cl2$valence,add=T,drawlabels=T,levels=seq(4,6,0.1))
text(map.cl2[,1],map.cl2[,2],rownames(psy.dist.class2))

plot of chunk unnamed-chunk-21

IDW.cl3 <- IDW.map(map.cl3,loc)
image(x=IDW.cl3$x, y=IDW.cl3$y,z=IDW.cl3$valence,axes=F,xlab="",ylab="",col=heat.colors(20),breaks=seq(4,6,0.1),main="cluster 3(Keiken Lv60)")
contour(x=IDW.cl3$x,y=IDW.cl3$y,z=IDW.cl3$valence,add=T,drawlabels=T,levels=seq(4,6,0.1))
text(map.cl3[,1],map.cl3[,2],rownames(psy.dist.class3))

plot of chunk unnamed-chunk-21

IDW.cl4 <- IDW.map(map.cl4,loc)
image(x=IDW.cl4$x, y=IDW.cl4$y,z=IDW.cl4$valence,axes=F,xlab="",ylab="",col=heat.colors(20),breaks=seq(4,6,0.1),main="cluster 4(Keiken Lv70)")
contour(x=IDW.cl4$x,y=IDW.cl4$y,z=IDW.cl4$valence,add=T,drawlabels=T,levels=seq(4,6,0.1))
text(map.cl4[,1],map.cl4[,2],rownames(psy.dist.class4))

plot of chunk unnamed-chunk-21

山沿い(飛騨，湯布院，野沢，秋吉台)，海沿い(志摩，舞鶴，佐世保)，遠方(札幌，宮古島)の三群に別れることと，徐々に遠方を高く評価していること，がわかります。

とりあえずこんなところでどうでしょう。

九州心理学会2014

Kosugitti

2014年11月12日

はじめに

デモグラフィック要因のクラスタリングから

名義尺度の有効性チェック

数値化してからクラスタリング

クラスタリングの特徴をみるなど

元データとの関係

尺度との関係

push要因＝旅行動機の尺度(Q1)

pull要因＝旅行先に期待するもの(Q2)

イメージの違い

以下学会発表

ちょっとした結論

等高線モデルもね。