ここでは,4件法の順序尺度水準で得られたデータを因子分析する例についてお話します。 従来通りの因子分析の方法,段階反応モデルを使った方法,多次元IRTを使う方法,の三段階にわけて比較検討しながら進めてまいります。 このお話の背景については,こちら (“http://kosugitti.hatenadiary.jp/entry/2012/12/19/144228”) もご一読いただければ幸いです。
今回は次のパッケージをご用意ください。
library(psych)
library(ltm)
## Loading required package: MASS
## Loading required package: msm
## Loading required package: mvtnorm
## Loading required package: polycor
## Loading required package: sfsmisc
## Attaching package: 'polycor'
## The following object(s) are masked from 'package:psych':
##
## polyserial
## Attaching package: 'ltm'
## The following object(s) are masked from 'package:psych':
##
## factor.scores
library(mirt)
## Loading required package: stats4
## Loading required package: lattice
## Attaching package: 'mirt'
## The following object(s) are masked from 'package:ltm':
##
## Science
サンプルデータはltmパッケージにあるScienceデータを使います。4件法で7つの項目があります。
data(Science)
summary(ltm::Science)
## Comfort Environment Work
## strongly disagree: 5 strongly disagree: 29 strongly disagree: 33
## disagree : 32 disagree : 90 disagree : 98
## agree :266 agree :145 agree :206
## strongly agree : 89 strongly agree :128 strongly agree : 55
## Future Technology Industry
## strongly disagree: 14 strongly disagree: 18 strongly disagree: 10
## disagree : 72 disagree : 91 disagree : 47
## agree :210 agree :157 agree :173
## strongly agree : 96 strongly agree :126 strongly agree :162
## Benefit
## strongly disagree: 21
## disagree :100
## agree :193
## strongly agree : 78
従来はこうした4件法であっても,(無理矢理)間隔尺度水準とみなして分析していたわけです。 すなわち,相関係数の出し方がピアソンの積率相関係数で,それに基づく因子分析だった。 ちょっとやってみます。まず間隔尺度水準に置き換えます。
Science.n <- data.frame(data.matrix(ltm::Science))
積率相関係数とポリコリック相関係数を比較します。ポリコリック相関係数は,ltmパッケージが読み込むpolycorパッケージにあるhetcor関数で,変数の型にあった適切な相関係数を出してくれる関数。
print(cor(Science.n), digit = 2)
## Comfort Environment Work Future Technology Industry Benefit
## Comfort 1.000 0.077 0.151 0.284 0.071 0.135 0.334
## Environment 0.077 1.000 -0.071 -0.030 0.387 0.328 -0.032
## Work 0.151 -0.071 1.000 0.401 -0.089 -0.019 0.167
## Future 0.284 -0.030 0.401 1.000 -0.028 0.062 0.313
## Technology 0.071 0.387 -0.089 -0.028 1.000 0.345 -0.012
## Industry 0.135 0.328 -0.019 0.062 0.345 1.000 0.086
## Benefit 0.334 -0.032 0.167 0.313 -0.012 0.086 1.000
result.het <- hetcor(ltm::Science)
print(result.het$correlations, digit = 2)
## Comfort Environment Work Future Technology Industry Benefit
## Comfort 1.000 0.099 0.2012 0.346 0.090 0.1857 0.408
## Environment 0.099 1.000 -0.0828 -0.028 0.464 0.4108 -0.037
## Work 0.201 -0.083 1.0000 0.479 -0.104 -0.0076 0.209
## Future 0.346 -0.028 0.4786 1.000 -0.036 0.1027 0.377
## Technology 0.090 0.464 -0.1039 -0.036 1.000 0.4348 -0.014
## Industry 0.186 0.411 -0.0076 0.103 0.435 1.0000 0.118
## Benefit 0.408 -0.037 0.2086 0.377 -0.014 0.1180 1.000
結果をみると,ポリコリック相関係数のほうが数値が大きい。逆に言うと,順序尺度水準の変数を無理矢理間隔尺度水準とみなして積率相関係数を出すことは,不適切なモデル適用によって値が過小評価されちゃうことでもあるわけです。
で,普通の因子分析というのは,連続変数とみなしたデータに対して固有値分解し,因子数を決めたりするわけですね。 因子数の決定には,psychパッケージのfa.parallel関数による並行分析がいいかも。
fa.parallel(Science.n)
## Parallel analysis suggests that the number of factors = 3 and the number of components = 2
fa.parallel(result.het$correlations, n.obs = 392)
## Parallel analysis suggests that the number of factors = 3 and the number of components = 2
まあ結果は変わらないんですけどね。
因子分析も連続変数と見なした古典的方法と,ポリコリック相関係数をつかった方法,両方で見てみましょう。 まずは古典的方法から。
fa.result.n <- fa(Science.n, nfactors = 3, fm = "ml", rotate = "promax")
print(fa.result.n, sort = T, digit = 3)
## Factor Analysis using method = ml
## Call: fa(r = Science.n, nfactors = 3, rotate = "promax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## item ML2 ML1 ML3 h2 u2
## Technology 5 0.639 -0.021 -0.046 0.403 0.597
## Environment 2 0.617 0.002 -0.080 0.372 0.628
## Industry 6 0.545 0.040 0.058 0.314 0.686
## Future 4 0.018 0.793 -0.010 0.620 0.380
## Work 3 -0.082 0.548 -0.067 0.273 0.727
## Benefit 7 -0.076 -0.044 0.799 0.587 0.413
## Comfort 1 0.119 0.170 0.341 0.236 0.764
##
## ML2 ML1 ML3
## SS loadings 1.103 0.952 0.751
## Proportion Var 0.158 0.136 0.107
## Cumulative Var 0.158 0.294 0.401
## Proportion Explained 0.393 0.339 0.268
## Cumulative Proportion 0.393 0.732 1.000
##
## With factor correlations of
## ML2 ML1 ML3
## ML2 1.000 -0.008 0.154
## ML1 -0.008 1.000 0.560
## ML3 0.154 0.560 1.000
##
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 21 and the objective function was 0.825 with Chi Square of 320
## The degrees of freedom for the model are 3 and the objective function was 0.001
##
## The root mean square of the residuals (RMSR) is 0.002
## The df corrected root mean square of the residuals is 0.009
## The number of observations was 392 with Chi Square = 0.222 with prob < 0.974
##
## Tucker Lewis Index of factoring reliability = 1.065
## RMSEA index = 0 and the 90 % confidence intervals are NA NA
## BIC = -17.69
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML2 ML1 ML3
## Correlation of scores with factors 0.798 0.836 0.813
## Multiple R square of scores with factors 0.636 0.699 0.661
## Minimum correlation of possible factor scores 0.273 0.399 0.322
ポリコリック相関係数を使った方。
fa.result <- fa(result.het$correlations, nfactors = 3, n.obs = 392, fm = "ml",
rotate = "promax")
print(fa.result, sort = T, digit = 3)
## Factor Analysis using method = ml
## Call: fa(r = result.het$correlations, nfactors = 3, n.obs = 392, rotate = "promax",
## fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## item ML2 ML3 ML1 h2 u2
## Technology 5 0.700 -0.049 -0.047 0.485 0.515
## Environment 2 0.674 -0.009 -0.089 0.443 0.557
## Industry 6 0.630 0.065 0.055 0.421 0.579
## Future 4 0.016 0.832 -0.009 0.686 0.314
## Work 3 -0.090 0.633 -0.090 0.351 0.649
## Benefit 7 -0.075 -0.050 0.895 0.736 0.264
## Comfort 1 0.143 0.213 0.352 0.293 0.707
##
## ML2 ML3 ML1
## SS loadings 1.364 1.132 0.919
## Proportion Var 0.195 0.162 0.131
## Cumulative Var 0.195 0.357 0.488
## Proportion Explained 0.399 0.331 0.269
## Cumulative Proportion 0.399 0.731 1.000
##
## With factor correlations of
## ML2 ML3 ML1
## ML2 1.000 0.025 0.160
## ML3 0.025 1.000 0.573
## ML1 0.160 0.573 1.000
##
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 21 and the objective function was 1.249 with Chi Square of 484.3
## The degrees of freedom for the model are 3 and the objective function was 0.001
##
## The root mean square of the residuals (RMSR) is 0.002
## The df corrected root mean square of the residuals is 0.009
## The number of observations was 392 with Chi Square = 0.287 with prob < 0.963
##
## Tucker Lewis Index of factoring reliability = 1.041
## RMSEA index = 0 and the 90 % confidence intervals are NA NA
## BIC = -17.63
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML2 ML3 ML1
## Correlation of scores with factors 0.845 0.872 0.881
## Multiple R square of scores with factors 0.714 0.761 0.776
## Minimum correlation of possible factor scores 0.428 0.522 0.553
これまた結果は大きく変わらないのですが,後者の方がやや大きい負荷量が算出されている。前者の方がやや「目減り」していたわけです。
後者がカテゴリカル因子分析のやり方ですが,いったんhetcor関数でポリコリック相関係数を算出させるあたりがちょっといやですね。 ということで,ルートとしては,この因子分析の結果を受けてIRT(GRM)ということが考えられます。 最近,psychパッケージにirt.faというポリコリック相関係数から因子分析できる関数が追加されました。 変数は数値型(Factor型ではない)で渡す必要がありますが,次のようにします。
result.irtFa <- irt.fa(Science.n, nfactors = 3, fm = "ml")
## Loading required package: GPArotation
result.irtFa$fa
## Factor Analysis using method = ml
## Call: fa(r = r, nfactors = nfactors, n.obs = n.obs, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML2 ML3 ML1 h2 u2
## Comfort 0.17 0.24 0.33 0.28 0.72
## Environment 0.67 -0.02 -0.06 0.44 0.56
## Work -0.09 0.61 -0.06 0.35 0.65
## Future 0.02 0.82 0.02 0.70 0.30
## Technology 0.69 -0.05 -0.02 0.49 0.51
## Industry 0.63 0.07 0.07 0.42 0.58
## Benefit -0.02 -0.01 0.90 0.80 0.20
##
## ML2 ML3 ML1
## SS loadings 1.37 1.14 0.96
## Proportion Var 0.20 0.16 0.14
## Cumulative Var 0.20 0.36 0.50
## Proportion Explained 0.39 0.33 0.28
## Cumulative Proportion 0.39 0.72 1.00
##
## With factor correlations of
## ML2 ML3 ML1
## ML2 1.00 0.00 0.06
## ML3 0.00 1.00 0.49
## ML1 0.06 0.49 1.00
##
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 21 and the objective function was 1.25 with Chi Square of 484.1
## The degrees of freedom for the model are 3 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is 0.01
## The number of observations was 392 with Chi Square = 0.28 with prob < 0.96
##
## Tucker Lewis Index of factoring reliability = 1.041
## RMSEA index = 0 and the 90 % confidence intervals are NA NA
## BIC = -17.64
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML2 ML3 ML1
## Correlation of scores with factors 0.84 0.87 0.90
## Multiple R square of scores with factors 0.71 0.76 0.81
## Minimum correlation of possible factor scores 0.43 0.52 0.63
結果の全体像をみるには,print関数でshort=Fオプションをつけましょう。
print(result.irtFa, short = FALSE)
## Item Response Analysis using Factor Analysis
##
## Call: irt.fa(x = Science.n, nfactors = 3, fm = "ml")
## Item Response Analysis using Factor Analysis
##
## Summary information by factor and item
## Factor = 1
## -3 -2 -1 0 1 2 3
## Environment 0.27 0.41 0.49 0.47 0.36 0.22 0.11
## Technology 0.33 0.46 0.52 0.49 0.38 0.22 0.11
## Industry 0.31 0.39 0.41 0.36 0.26 0.16 0.08
## Test Info 0.91 1.26 1.41 1.32 1.00 0.60 0.30
## SEM 1.05 0.89 0.84 0.87 1.00 1.29 1.82
## Reliability -0.10 0.21 0.29 0.24 0.00 -0.66 -2.31
##
## Factor = 2
## -3 -2 -1 0 1 2 3
## Work 0.20 0.29 0.35 0.35 0.30 0.23 0.15
## Future 0.69 0.72 0.65 0.51 0.58 0.41 0.14
## Test Info 0.89 1.01 1.00 0.86 0.89 0.64 0.29
## SEM 1.06 0.99 1.00 1.08 1.06 1.25 1.85
## Reliability -0.12 0.01 0.00 -0.17 -0.13 -0.57 -2.44
##
## Factor = 3
## -3 -2 -1 0 1 2 3
## Comfort 0.08 0.09 0.09 0.09 0.08 0.07 0.06
## Benefit 0.78 0.65 1.04 0.43 0.53 1.03 0.37
## Test Info 0.86 0.74 1.13 0.51 0.61 1.10 0.43
## SEM 1.08 1.16 0.94 1.40 1.28 0.95 1.53
## Reliability -0.17 -0.35 0.12 -0.95 -0.63 0.09 -1.33
##
## Item discrimination and location for factor ML2
## discrimination location.1 location.2 location.3
## Comfort 0.17 -2.27 -1.33 0.76
## Environment 0.89 -1.94 -0.69 0.60
## Work -0.09 -1.38 -0.43 1.08
## Future 0.02 -1.80 -0.77 0.69
## Technology 0.97 -2.34 -0.82 0.64
## Industry 0.82 -2.52 -1.36 0.28
## Benefit -0.02 -1.61 -0.50 0.85
##
## Item discrimination and location for factor ML3
## discrimination location.1 location.2 location.3
## Comfort 0.25 -2.30 -1.35 0.77
## Environment -0.02 -1.45 -0.51 0.45
## Work 0.76 -1.73 -0.54 1.36
## Future 1.45 -3.18 -1.37 1.22
## Technology -0.05 -1.69 -0.59 0.46
## Industry 0.07 -1.96 -1.06 0.22
## Benefit -0.01 -1.61 -0.50 0.85
##
## Item discrimination and location for factor ML1
## discrimination location.1 location.2 location.3
## Comfort 0.35 -2.37 -1.39 0.79
## Environment -0.06 -1.45 -0.52 0.45
## Work -0.06 -1.38 -0.43 1.08
## Future 0.02 -1.80 -0.77 0.69
## Technology -0.02 -1.69 -0.59 0.46
## Industry 0.07 -1.96 -1.06 0.22
## Benefit 2.03 -3.65 -1.13 1.92
##
## These parameters were based on the following factor analysis
## Factor Analysis using method = ml
## Call: fa(r = r, nfactors = nfactors, n.obs = n.obs, fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
## ML2 ML3 ML1 h2 u2
## Comfort 0.17 0.24 0.33 0.28 0.72
## Environment 0.67 -0.02 -0.06 0.44 0.56
## Work -0.09 0.61 -0.06 0.35 0.65
## Future 0.02 0.82 0.02 0.70 0.30
## Technology 0.69 -0.05 -0.02 0.49 0.51
## Industry 0.63 0.07 0.07 0.42 0.58
## Benefit -0.02 -0.01 0.90 0.80 0.20
##
## ML2 ML3 ML1
## SS loadings 1.37 1.14 0.96
## Proportion Var 0.20 0.16 0.14
## Cumulative Var 0.20 0.36 0.50
## Proportion Explained 0.39 0.33 0.28
## Cumulative Proportion 0.39 0.72 1.00
##
## With factor correlations of
## ML2 ML3 ML1
## ML2 1.00 0.00 0.06
## ML3 0.00 1.00 0.49
## ML1 0.06 0.49 1.00
##
## Test of the hypothesis that 3 factors are sufficient.
##
## The degrees of freedom for the null model are 21 and the objective function was 1.25 with Chi Square of 484.1
## The degrees of freedom for the model are 3 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is 0.01
## The number of observations was 392 with Chi Square = 0.28 with prob < 0.96
##
## Tucker Lewis Index of factoring reliability = 1.041
## RMSEA index = 0 and the 90 % confidence intervals are NA NA
## BIC = -17.64
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## ML2 ML3 ML1
## Correlation of scores with factors 0.84 0.87 0.90
## Multiple R square of scores with factors 0.71 0.76 0.81
## Minimum correlation of possible factor scores 0.43 0.52 0.63
まあ他にも色々できるようですが,まだちょっと関数の挙動が怪しかったりします。
IRT(GRM)をやるには,以前からある専門のltmパッケージ使った方が確実かも。 今回第一因子はTechnology,Environment,Industryからなり,第二因子はFuture,Work,第三因子はBenefit,Comfortからなるわけですから,ltmパッケージのgrm関数を使って,
grm(ltm::Science[c(5, 2, 6)]) #Factor 1
##
## Call:
## grm(data = ltm::Science[c(5, 2, 6)])
##
## Coefficients:
## Extrmt1 Extrmt2 Extrmt3 Dscrmn
## Technology -2.386 -0.855 0.624 1.751
## Environment -2.112 -0.772 0.618 1.626
## Industry -3.005 -1.589 0.301 1.522
##
## Log.Lik: -1312
grm(ltm::Science[c(4, 3)]) #Factor 2
##
## Call:
## grm(data = ltm::Science[c(4, 3)])
##
## Coefficients:
## Extrmt1 Extrmt2 Extrmt3 Dscrmn
## Future -2.538 -1.063 0.942 1.817
## Work -1.982 -0.610 1.547 1.685
##
## Log.Lik: -856.8
grm(ltm::Science[c(7, 1)]) #Factor 3
##
## Call:
## grm(data = ltm::Science[c(7, 1)])
##
## Coefficients:
## Extrmt1 Extrmt2 Extrmt3 Dscrmn
## Benefit -1.879 -0.570 0.975 3.037
## Comfort -4.851 -2.632 1.461 0.994
##
## Log.Lik: -776.2
とすることができます。
もっとも,これだと因子間相関がでないですね。IRT(GRM)は基本的に単因子モデルですから。 これを多次元モデルにするのがmirtパッケージのmirt関数です。数値型の変数を渡して,
result.mirt <- mirt(Science.n, 3, itemtype = "graded", rotate = "promax")
summary(result.mirt)
##
## Rotation: promax
##
## Rotated factor loadings:
##
## F_1 F_2 F_3 h2
## Comfort 0.293 -0.218 0.347 0.421
## Environment -0.113 -0.665 -0.039 0.429
## Work -0.148 0.048 0.705 0.390
## Future 0.049 0.038 0.747 0.591
## Technology 0.077 -0.679 -0.141 0.449
## Industry -0.069 -0.680 0.107 0.487
## Benefit 1.000 0.068 0.011 0.996
##
## Rotated SS loadings: 1.133 1.422 1.207
##
## Factor correlations:
##
## F_1 F_2 F_3
## F_1 1.000 -0.145 0.569
## F_2 -0.145 1.000 -0.212
## F_3 0.569 -0.212 1.000
となります。
とまあ,このようにかなりカテゴリカルな因子分析ができる環境が整ってきているわけです。 これからのスタンダードが変わるといいなあ,と思っています。
余談ですが,もっとちゃんとしたいという人は,M-plusというソフトがありますよ。