No wonder so many differences exist among people treated the same way in the same conditions of our experiments. Our “error variance” is due not just to measurement error, but also to the hundreds and thousands of differences among people in so many characteristics.

“The joy of discovery is certainly the liveliest that the mind of man can ever feel”
- Claude Bernard -

{ width= 30%}
Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. Routledge.

https://github.com/cddesja/lavaan-reproducible

1 EFA

우선 어떤 분석이든 간에 분석에 앞서 반드시 확인해야하는 것
1. 변수의 개수
3. 관찰치의 개수
4. 변수의 특성(e.g., 범주형? 명목형? 연속형? …)
5. 척도의 의미(e.g., 숫자가 클수록 해당 변수의 특성이 높음을 나타내는가? …)
6. 변수의 분포
7. 결측치
8. 기술통계량, 상관행렬 등
9. …

1.1 EFA 분석 사전 과정(패키지 설치 및 열기, 데이터 열기)

1.1.1 패키지 설치

wants <- c("psych","magrittr","lavaan","GPArotation","tidyr") 
has   <- wants %in% rownames(installed.packages())
if(any(!has)) install.packages(wants[!has])

이를 통해 설치되어 있지 않은 패키지만 선택적으로 설치 가능.
필요한 것 중 설치되어 있지 않은 것이 있다면 설치해주렴
wants: 필요한 패키지
has: 설치되어 있는 패키지

1.1.2 라이브러리 열기

library(psych)
library(lavaan)
library(magrittr)
library(GPArotation)
library(tidyr)

필요한 library 열기

1.1.3 데이터 열기

table4.1 <- read.table("http://quantpsy.cau.ac.kr/wp-content/data/raykov/TABLE4_1.dat", 
                       fill=T, 
                       col.names = c(1:11))
# 이 외에도 모든 데이터는 해당 주소에 원본과 동일한 파일명으로 업로드해두었음.
# ~ raykov/***.dat
# 주소는 동일하므로, *** 부분만 바꿔활용하면 됨.
table4.1

##      X1   X2   X3   X4   X5   X6   X7   X8   X9  X10 X11
## 1  1.00   NA   NA   NA   NA   NA   NA   NA   NA   NA  NA
## 2  0.64 1.00   NA   NA   NA   NA   NA   NA   NA   NA  NA
## 3  0.66 0.65 1.00   NA   NA   NA   NA   NA   NA   NA  NA
## 4  0.62 0.62 0.65 1.00   NA   NA   NA   NA   NA   NA  NA
## 5  0.64 0.66 0.62 0.64 1.00   NA   NA   NA   NA   NA  NA
## 6  0.63 0.67 0.63 0.66 0.65 1.00   NA   NA   NA   NA  NA
## 7  0.17 0.12 0.15 0.13 0.12 0.15 1.00   NA   NA   NA  NA
## 8  0.13 0.17 0.15 0.17 0.16 0.18 0.55 1.00   NA   NA  NA
## 9  0.15 0.13 0.12 0.14 0.15 0.12 0.57 0.52 1.00   NA  NA
## 10 0.16 0.15 0.15 0.16 0.17 0.15 0.51 0.57 0.59 1.00  NA
## 11 0.13 0.17 0.16 0.15 0.15 0.11 0.60 0.53 0.55 0.58   1

원자료는 lower tri만 입력된 correlation matrix. 이런 형태는 분석이 불가능함

table4.1[upper.tri(table4.1)] <- t(table4.1)[upper.tri(table4.1)]
# 전치한 원상관행렬의 upper tri의 값을 원상관행렬의 upper tri에 할당
colnames(table4.1) <- c("Information","Comprehension","Arithmetic","Similarities","Digit Span","Vocabulary","Digit Symbol","Picture Completion",
                        "Block Design","Picture Arrangement","Object Assembly")
rownames(table4.1) <- c("Information","Comprehension","Arithmetic","Similarities","Digit Span","Vocabulary","Digit Symbol","Picture Completion",
                        "Block Design","Picture Arrangement","Object Assembly")

knitr::kable(table4.1,"html") %>% kableExtra::kable_styling(font_size=12) %>% kableExtra::column_spec(1,bold=T,color = "#444444") %>% kableExtra::scroll_box(width="100%") # 이건 그냥 table 4.1을 한줄로 보여주기 위한 코드. 중요 X

	Information	Comprehension	Arithmetic	Similarities	Digit Span	Vocabulary	Digit Symbol	Picture Completion	Block Design	Picture Arrangement	Object Assembly
Information	1.00	0.64	0.66	0.62	0.64	0.63	0.17	0.13	0.15	0.16	0.13
Comprehension	0.64	1.00	0.65	0.62	0.66	0.67	0.12	0.17	0.13	0.15	0.17
Arithmetic	0.66	0.65	1.00	0.65	0.62	0.63	0.15	0.15	0.12	0.15	0.16
Similarities	0.62	0.62	0.65	1.00	0.64	0.66	0.13	0.17	0.14	0.16	0.15
Digit Span	0.64	0.66	0.62	0.64	1.00	0.65	0.12	0.16	0.15	0.17	0.15
Vocabulary	0.63	0.67	0.63	0.66	0.65	1.00	0.15	0.18	0.12	0.15	0.11
Digit Symbol	0.17	0.12	0.15	0.13	0.12	0.15	1.00	0.55	0.57	0.51	0.60
Picture Completion	0.13	0.17	0.15	0.17	0.16	0.18	0.55	1.00	0.52	0.57	0.53
Block Design	0.15	0.13	0.12	0.14	0.15	0.12	0.57	0.52	1.00	0.59	0.55
Picture Arrangement	0.16	0.15	0.15	0.16	0.17	0.15	0.51	0.57	0.59	1.00	0.58
Object Assembly	0.13	0.17	0.16	0.15	0.15	0.11	0.60	0.53	0.55	0.58	1.00

upper tri를 채워서, 분석 가능한 형태의 correlation matrix로 변환
row의 이름은 채우지 않아도 psych::fa 분석 가능함. 그냥 보기 좋으라고 채움.

table4.1 데이터를 쉽게 불러들이지 못한 이유

그림의 위는 table4.1, 아래는 T.A. Brown의 CFA 자료 예시.
데이터명만 지정하는 단순한 코드(e.g. read.delim("blahblah.txt"), read.table("blahblah.txt"))는 tenko raykov가 제공한 데이터를 읽을 수 없음.
tenko의 자료는 첫 행에 1개의 변수만 입력되어 있으므로 R은 입력될 총 변수의 갯수가 몇개인지 알지 못함. 따라서 데이터 오픈에 문제가 발생
그러므로 col.names=c(1:11) 연산자를 통해 11개의 변수가 입력될 것이라고 수동으로 말해주어야 함.
Brown이 제공한 데이터는 첫줄에 SD가 있기 때문에 R이 몇 개의 변수가 입력될지 알 수 있으므로, 그 다음 행에서 빈 값은 알아서 NA로 처리함. 따라서 단순하게 파일명만 지정해줘도 데이터를 여는 것에는 문제가 발생하지 않음.

1.2 분석 시작

1.2.1 분석의 첫단계: correlation, scatter plot, descriptive statistics

# 분석 전에 corrleation, scatter plot 등 반드시 확인.
psych::cor.plot(table4.1)

상관계수 시각화
1- 6, 7- 10 변수가 관련 있음. 즉 두 덩어리로 나뉨.
이를 통해 2요인이 적합할 것만 같은 느낌이 옴

1.2.2 EFA 요인의 수 결정. eigenvalues, scree, parallel test

1.2.2.1 eigenvalues

eigen(table4.1)$values # eigen values

##  [1] 4.6666102 2.7762513 0.5171068 0.4945882 0.4555689 0.3974283 0.3863103
##  [8] 0.3628892 0.3471761 0.3115097 0.2845611

아이겐 밸류도 2요인이 적합함을 나타냄.

1.2.2.2 scree

scree(table4.1) # scree

스크리 플롯도 2요인이 적합함을 나타냄.

1.2.2.3 parallel test

fa.parallel(table4.1, 
            fm= "ml", 
            n.obs= 300) # parallel test

## Parallel analysis suggests that the number of factors =  2  and the number of components =  2

평행분석도 2요인이 적합함을 나타냄.
아이겐밸류, 스크리, 평행분석 결과를 통해서 분석에 활용한 11개의 변수는 2요인 구조가 적합 것 같다고 결론 내림

1.2.3 1요인 EFA 결과

table4.1_efa_f1 <- psych::fa(table4.1,
                             n.obs= 300,
                             nfactors= 1,
                             fm= "mle")
print.psych(table4.1_efa_f1, 
            digits= 3,
            cut=NULL)

## Factor Analysis using method =  ml
## Call: psych::fa(r = table4.1, nfactors = 1, n.obs = 300, fm = "mle")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                       ML1     h2    u2 com
## Information         0.792 0.6272 0.373   1
## Comprehension       0.808 0.6525 0.348   1
## Arithmetic          0.797 0.6356 0.364   1
## Similarities        0.793 0.6282 0.372   1
## Digit Span          0.799 0.6380 0.362   1
## Vocabulary          0.807 0.6505 0.349   1
## Digit Symbol        0.222 0.0491 0.951   1
## Picture Completion  0.245 0.0598 0.940   1
## Block Design        0.216 0.0466 0.953   1
## Picture Arrangement 0.242 0.0584 0.942   1
## Object Assembly     0.228 0.0522 0.948   1
## 
##                  ML1
## SS loadings    4.098
## Proportion Var 0.373
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  55  and the objective function was  5.957 with Chi Square of  1754.271
## The degrees of freedom for the model are 44  and the objective function was  2.05 
## 
## The root mean square of the residuals (RMSR) is  0.217 
## The df corrected root mean square of the residuals is  0.243 
## 
## The harmonic number of observations is  300 with the empirical chi square  1559.654  with prob <  2.3e-298 
## The total number of observations was  300  with Likelihood Chi Square =  602.249  with prob <  3.98e-99 
## 
## Tucker Lewis Index of factoring reliability =  0.5884
## RMSEA index =  0.2079  and the 90 % confidence intervals are  0.1915 0.2208
## BIC =  351.283
## Fit based upon off diagonal values = 0.739
## Measures of factor score adequacy             
##                                                     ML1
## Correlation of (regression) scores with factors   0.957
## Multiple R square of scores with factors          0.916
## Minimum correlation of possible factor scores     0.832

# print.psych로 결과물 출력해주는게 좋음.

digits: 출력할 소수점 자리수
cut: factor loading을 몇 점을 기준으로 없앨지
ML1: factor loading
h2: commuanalities
u2: estimated residual variacnes(uniquenesses)
결과에서 TLI는 보여주지만 CFI 보여주지 않음

### exploratory factor analysis with 1 factor (s): test of model fit
# r: correlation or raw data
# f: factor analysis loadings matrix from FA
# 69쪽 상단 결과
fa.stats(r= table4.1,
         f= table4.1_efa_f1,
         n.obs= 300)

## Call: fa.stats(r = table4.1, f = table4.1_efa_f1, n.obs = 300)
## 
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the model is 44  and the fit was  2.05 
## The number of observations was  300  with Chi Square =  602.25  with prob <  4e-99 
## 
## Measures of factor score adequacy             
##  Correlation of scores with factors            0.96
## Multiple R square of scores with factors       0.92
## Minimum correlation of factor score estimates  0.83

### Chi-Square Test of Model Fit for the Baseline Model
# 69쪽 하단 결과
cortest.bartlett(table4.1, n= 300)

## $chisq
## [1] 1754.271
## 
## $p.value
## [1] 0
## 
## $df
## [1] 55

# fa 패키지의 EFA 결과는 CFI 값을 제공해주지 않음. 공식을 활용해 따로 계산 필요
CFI <- function(x){
  cfi <- 1-(x$STATISTIC-x$dof)/(x$null.chisq-x$null.dof)
  return(cfi)
}
CFI(table4.1_efa_f1)

## [1] 0.6714774

CFI 공식을 활용하여 계산

1.2.4 2요인 EFA 결과

table4.1_efa_f2 <- psych::fa(table4.1,
                             n.obs= 300, 
                             nfactors= 2,
                             rotate= "geominQ",
                             fm= "mle")
print.psych(table4.1_efa_f2, 
            digits= 3,
            cut= NULL)

## Factor Analysis using method =  ml
## Call: psych::fa(r = table4.1, nfactors = 2, n.obs = 300, rotate = "geominQ", 
##     fm = "mle")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                        ML1    ML2    h2    u2 com
## Information          0.792  0.007 0.630 0.370   1
## Comprehension        0.810  0.002 0.657 0.343   1
## Arithmetic           0.799  0.002 0.640 0.360   1
## Similarities         0.792  0.008 0.631 0.369   1
## Digit Span           0.799  0.007 0.641 0.359   1
## Vocabulary           0.813 -0.008 0.657 0.343   1
## Digit Symbol        -0.006  0.748 0.557 0.443   1
## Picture Completion   0.028  0.713 0.519 0.481   1
## Block Design        -0.013  0.751 0.560 0.440   1
## Picture Arrangement  0.014  0.752 0.570 0.430   1
## Object Assembly     -0.003  0.763 0.580 0.420   1
## 
##                         ML1   ML2
## SS loadings           3.857 2.787
## Proportion Var        0.351 0.253
## Cumulative Var        0.351 0.604
## Proportion Explained  0.581 0.419
## Cumulative Proportion 0.581 1.000
## 
##  With factor correlations of 
##       ML1   ML2
## ML1 1.000 0.238
## ML2 0.238 1.000
## 
## Mean item complexity =  1
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  55  and the objective function was  5.957 with Chi Square of  1754.271
## The degrees of freedom for the model are 34  and the objective function was  0.107 
## 
## The root mean square of the residuals (RMSR) is  0.017 
## The df corrected root mean square of the residuals is  0.022 
## 
## The harmonic number of observations is  300 with the empirical chi square  9.942  with prob <  1 
## The total number of observations was  300  with Likelihood Chi Square =  31.513  with prob <  0.59 
## 
## Tucker Lewis Index of factoring reliability =  1.0024
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.0377
## BIC =  -162.416
## Fit based upon off diagonal values = 0.998
## Measures of factor score adequacy             
##                                                     ML1   ML2
## Correlation of (regression) scores with factors   0.957 0.930
## Multiple R square of scores with factors          0.916 0.864
## Minimum correlation of possible factor scores     0.831 0.728

CFI(table4.1_efa_f2)

## [1] 1.001464

분석 결과 2요인 괜찮은 것 같음. 그런데 factor loading이 작은 값들은 의미가 없으므로 .4를 기준으로 출력하지 않기로 함 cut= .4.
제거한 결과의 loading은 아래와 같음

# print.psych(table4.1_efa_f2, cut= .4)

## Standardized loadings (pattern matrix) based upon correlation matrix
##                       ML1   ML2   h2   u2 com
## Information          0.79       0.63 0.37   1
## Comprehension        0.81       0.66 0.34   1
## Arithmetic           0.80       0.64 0.36   1
## Similarities         0.79       0.63 0.37   1
## Digit Span           0.80       0.64 0.36   1
## Vocabulary           0.81       0.66 0.34   1
## Digit Symbol               0.75 0.56 0.44   1
## Picture Completion         0.71 0.52 0.48   1
## Block Design               0.75 0.56 0.44   1
## Picture Arrangement        0.75 0.57 0.43   1
## Object Assembly            0.76 0.58 0.42   1

1.3 Miscellaneous

일반적인 BIC 공식:
\(BIC = -2logL + kp\),
\[ \begin{aligned} \ k &= log(n) for BIC \\ \ k &= 2 for AIC \\ \end{aligned} \] \(p is the number of paramters in the model\)
하지만 psych::fa 의 BIC는 chi + 2p로 공식이 다름. 고로 숫자도 많이 다름. 주의 요망.
왜인지는 모르겠음. 여튼 주의가 필요함..
loglikelihood의 \(H_{0}\)와 \(H_{1}\) 값의 차이에 2배 한 것은 \(\chi^2\)의 값과 동일함. 70p 참고.
\(\chi^2=2NF_{ML}=2(LL_{H_{1}}-LL_{H_{0}})\)
SRMR이나 loading의 SE는 어떻게 얻는지 고민을 좀..
참고: https://stats.stackexchange.com/questions/235872/what-is-the-formula-for-standardized-root-mean-residual-srmr-in-the-context-of

2 CFA 추정

2.1 chapter 4.4 (table 4.2)

table4.2 <- read.table("http://quantpsy.cau.ac.kr/wp-content/data/raykov/TABLE4_2.dat", fill=T, col.names = c(1:11))

table4.2[upper.tri(table4.2)] <- t(table4.2)[upper.tri(table4.2)]

table4.2 <- as.matrix(table4.2) # change dataframe to matrix
rownames(table4.2) <- paste("X", c(1:11), sep= "")
table4.2

##       X1   X2   X3   X4   X5   X6   X7   X8   X9  X10  X11
## X1  1.60 0.67 0.68 0.63 0.65 0.63 0.17 0.25 0.12 0.23 0.13
## X2  0.67 1.50 0.65 0.61 0.67 0.68 0.12 0.13 0.15 0.26 0.17
## X3  0.68 0.65 1.30 0.65 0.63 0.64 0.15 0.15 0.11 0.13 0.26
## X4  0.63 0.61 0.65 1.40 0.64 0.67 0.14 0.16 0.17 0.24 0.15
## X5  0.65 0.67 0.63 0.64 1.30 0.65 0.12 0.26 0.15 0.17 0.15
## X6  0.63 0.68 0.64 0.67 0.65 1.10 0.15 0.18 0.12 0.15 0.11
## X7  0.17 0.12 0.15 0.14 0.12 0.15 1.20 0.54 0.55 0.58 0.60
## X8  0.25 0.13 0.15 0.16 0.26 0.18 0.54 1.10 0.52 0.61 0.57
## X9  0.12 0.15 0.11 0.17 0.15 0.12 0.55 0.52 0.94 0.58 0.56
## X10 0.23 0.26 0.13 0.24 0.17 0.15 0.58 0.61 0.58 0.99 0.59
## X11 0.13 0.17 0.26 0.15 0.15 0.11 0.60 0.57 0.56 0.59 0.99

lavaan을 활용하여 CFA를 추정할 때, 공분산 행렬을 활용하기 위해서는
1. 공분산 행렬이 data frame이 아닌 matrix여야함. as.matirx
2. colnames와 rownames가 동일해야함. rownames, colnames
실제 변수명으로 코딩해도(e.g., picture arrangement…) 괜찮지만, 이럴 경우 모형 입력이 복잡해지므로, 그냥 X1, X2, …, X11 이런 값을 쓰기로..

# 참고: lavaan의 defult는 X1과 X7의 분산을 1로 고정하고, F1과 F2의 분산을 자유모수로 추정
# !!!! 이는 교과서 모형과 다름!!!!!! 
table4.2_model_a <- 
'
F1 =~ X1 + X2 + X3 + X4 + X5 + X6 
F2 =~  X7 + X8 + X9 + X10 + X11
'

# X1과 X7의 분산을 고정하지 않기 위하여 요인을 구성하는 첫 번째 변수에 NA* 입력. NA*의 의미는 해당 값을 자유모수로 추정해라 라는 의미.
# F1과 F2의 분산을 1로 고정.
# 이게 table 4.2 모형임.
table4.2_model_b <- 
  '
# Factor Structure
    F1 =~ NA*X1 + X2 + X3 + X4 + X5 + X6 
    F2 =~  NA*X7 + X8 + X9 + X10 + X11
# Factor Variance
    F1 ~~ 1*F1
    F2 ~~ 1*F2
'

# mimic= "Mplus" 옵션 쓰면 Mplus랑 비슷하게 결과를 출력해줌.
# 몇몇 결과는 mimic 옵션 활용하지 않으면 도출되지 않으므로, 왠만하면 mimic을 기본적으로 쓰는 것을 권고함.
table4.2_fit_b <- lavaan::cfa(model= table4.2_model_b,          
                          sample.cov= table4.2,
                          sample.nobs= 300,
                          mimic= "Mplus")

summary(table4.2_fit_b, 
          standardized= F, 
          fit.measures= TRUE, 
          rsq= TRUE, 
          modindices= F)

## lavaan 0.6-3 ended normally after 15 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         34
## 
##   Number of observations                           300
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                      52.918
##   Degrees of freedom                                43
##   P-value (Chi-square)                           0.143
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic             1311.378
##   Degrees of freedom                                55
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.992
##   Tucker-Lewis Index (TLI)                       0.990
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -4351.903
##   Loglikelihood unrestricted model (H1)      -4325.444
## 
##   Number of free parameters                         34
##   Akaike (AIC)                                8771.807
##   Bayesian (BIC)                              8897.735
##   Sample-size adjusted Bayesian (BIC)         8789.907
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.028
##   90 Percent Confidence Interval          0.000  0.050
##   P-value RMSEA <= 0.05                          0.948
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.025
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Observed information based on                Hessian
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   F1 =~                                               
##     X1                0.805    0.070   11.497    0.000
##     X2                0.818    0.067   12.233    0.000
##     X3                0.800    0.061   13.034    0.000
##     X4                0.798    0.064   12.381    0.000
##     X5                0.801    0.061   13.065    0.000
##     X6                0.809    0.055   14.830    0.000
##   F2 =~                                               
##     X7                0.749    0.059   12.691    0.000
##     X8                0.743    0.056   13.304    0.000
##     X9                0.723    0.051   14.280    0.000
##     X10               0.790    0.051   15.551    0.000
##     X11               0.765    0.051   14.876    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   F1 ~~                                               
##     F2                0.268    0.063    4.228    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .X1                0.000    0.073    0.000    1.000
##    .X2                0.000    0.071    0.000    1.000
##    .X3                0.000    0.066    0.000    1.000
##    .X4                0.000    0.068    0.000    1.000
##    .X5                0.000    0.066    0.000    1.000
##    .X6                0.000    0.060    0.000    1.000
##    .X7                0.000    0.063    0.000    1.000
##    .X8                0.000    0.060    0.000    1.000
##    .X9                0.000    0.056    0.000    1.000
##    .X10               0.000    0.057    0.000    1.000
##    .X11               0.000    0.057    0.000    1.000
##     F1                0.000                           
##     F2                0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     F1                1.000                           
##     F2                1.000                           
##    .X1                0.946    0.088   10.776    0.000
##    .X2                0.826    0.079   10.515    0.000
##    .X3                0.656    0.065   10.147    0.000
##    .X4                0.759    0.073   10.456    0.000
##    .X5                0.654    0.064   10.146    0.000
##    .X6                0.442    0.049    9.059    0.000
##    .X7                0.635    0.060   10.512    0.000
##    .X8                0.545    0.053   10.257    0.000
##    .X9                0.414    0.042    9.763    0.000
##    .X10               0.363    0.041    8.807    0.000
##    .X11               0.401    0.043    9.317    0.000
## 
## R-Square:
##                    Estimate
##     X1                0.407
##     X2                0.448
##     X3                0.494
##     X4                0.456
##     X5                0.495
##     X6                0.597
##     X7                0.469
##     X8                0.503
##     X9                0.558
##     X10               0.632
##     X11               0.594

2.2 discrete components

2.2.1 데이터 및 모형

chapter4dat <- read.table("http://quantpsy.cau.ac.kr/wp-content/data/raykov/CHAPTER_4.DAT")
colnames(chapter4dat) <- paste0("ITEM",1:6)
chapter4dat_model <- 
'
F1 =~ ITEM1 + ITEM2 + ITEM3 + ITEM4 + ITEM5 + ITEM6
'
chapter4dat[,c("ITEM1","ITEM2","ITEM3",'ITEM4',"ITEM5","ITEM6")] <- 
  lapply(chapter4dat[,c("ITEM1","ITEM2","ITEM3",'ITEM4',"ITEM5","ITEM6")], ordered)

ordered 옵션을 활용해서 categorical 변수로 변환

# summary of categorical data proportions
tidyr::gather(chapter4dat, variable, value,1:6) %>% table %>% prop.table(1) %>% t %>% print(digits=2)

##      variable
## value ITEM1 ITEM2 ITEM3 ITEM4 ITEM5 ITEM6
##     0 0.160 0.461 0.419 0.452 0.471 0.485
##     1 0.231 0.045 0.055 0.055 0.039 0.038
##     2 0.286 0.264 0.244 0.213 0.213 0.200
##     3 0.323 0.231 0.282 0.281 0.277 0.277

chapter4_fit <- lavaan::cfa(chapter4dat_model,
                            chapter4dat,
                            estimator= "WLSMV",
                            ordered= c("ITEM1","ITEM2","ITEM3",'ITEM4',"ITEM5","ITEM6"),
                            mimic= "Mplus") # WRMR은 mimic Mplus 옵션 안 쓰면 안 나옴

summary(chapter4_fit,standardized= T, 
        fit.measures= TRUE,
        rsq= TRUE, 
        modindices= F)

## lavaan 0.6-3 ended normally after 17 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                         24
## 
##   Number of observations                           823
## 
##   Estimator                                       DWLS      Robust
##   Model Fit Test Statistic                       4.009       9.056
##   Degrees of freedom                                 9           9
##   P-value (Chi-square)                           0.911       0.432
##   Scaling correction factor                                  0.452
##   Shift parameter                                            0.190
##     for simple second-order correction (WLSMV)
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic             7897.208    4964.332
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.001       1.000
## 
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000       0.003
##   90 Percent Confidence Interval          0.000  0.015       0.000  0.039
##   P-value RMSEA <= 0.05                          1.000       0.992
## 
##   Robust RMSEA                                                  NA
##   90 Percent Confidence Interval                             0.000     NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.011       0.011
## 
## Weighted Root Mean Square Residual:
## 
##   WRMR                                           0.349       0.349
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model        Unstructured
##   Standard Errors                           Robust.sem
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   F1 =~                                                                 
##     ITEM1             1.000                               0.818    0.818
##     ITEM2             0.995    0.032   31.290    0.000    0.813    0.813
##     ITEM3             1.017    0.029   34.488    0.000    0.831    0.831
##     ITEM4             1.040    0.029   35.678    0.000    0.850    0.850
##     ITEM5             0.943    0.032   29.454    0.000    0.771    0.771
##     ITEM6             0.904    0.033   27.519    0.000    0.739    0.739
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .ITEM1             0.000                               0.000    0.000
##    .ITEM2             0.000                               0.000    0.000
##    .ITEM3             0.000                               0.000    0.000
##    .ITEM4             0.000                               0.000    0.000
##    .ITEM5             0.000                               0.000    0.000
##    .ITEM6             0.000                               0.000    0.000
##     F1                0.000                               0.000    0.000
## 
## Thresholds:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     ITEM1|t1         -0.993    0.053  -18.892    0.000   -0.993   -0.993
##     ITEM1|t2         -0.276    0.044   -6.224    0.000   -0.276   -0.276
##     ITEM1|t3          0.459    0.045   10.092    0.000    0.459    0.459
##     ITEM2|t1         -0.099    0.044   -2.263    0.024   -0.099   -0.099
##     ITEM2|t2          0.014    0.044    0.313    0.754    0.014    0.014
##     ITEM2|t3          0.736    0.048   15.228    0.000    0.736    0.736
##     ITEM3|t1         -0.204    0.044   -4.627    0.000   -0.204   -0.204
##     ITEM3|t2         -0.066    0.044   -1.497    0.134   -0.066   -0.066
##     ITEM3|t3          0.577    0.046   12.415    0.000    0.577    0.577
##     ITEM4|t1         -0.121    0.044   -2.750    0.006   -0.121   -0.121
##     ITEM4|t2          0.017    0.044    0.383    0.702    0.017    0.017
##     ITEM4|t3          0.581    0.047   12.483    0.000    0.581    0.581
##     ITEM5|t1         -0.072    0.044   -1.636    0.102   -0.072   -0.072
##     ITEM5|t2          0.026    0.044    0.592    0.554    0.026    0.026
##     ITEM5|t3          0.592    0.047   12.686    0.000    0.592    0.592
##     ITEM6|t1         -0.038    0.044   -0.870    0.384   -0.038   -0.038
##     ITEM6|t2          0.056    0.044    1.288    0.198    0.056    0.056
##     ITEM6|t3          0.592    0.047   12.686    0.000    0.592    0.592
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .ITEM1             0.332                               0.332    0.332
##    .ITEM2             0.338                               0.338    0.338
##    .ITEM3             0.309                               0.309    0.309
##    .ITEM4             0.278                               0.278    0.278
##    .ITEM5             0.406                               0.406    0.406
##    .ITEM6             0.454                               0.454    0.454
##     F1                0.668    0.029   22.703    0.000    1.000    1.000
## 
## Scales y*:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     ITEM1             1.000                               1.000    1.000
##     ITEM2             1.000                               1.000    1.000
##     ITEM3             1.000                               1.000    1.000
##     ITEM4             1.000                               1.000    1.000
##     ITEM5             1.000                               1.000    1.000
##     ITEM6             1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     ITEM1             0.668
##     ITEM2             0.662
##     ITEM3             0.691
##     ITEM4             0.722
##     ITEM5             0.594
##     ITEM6             0.546

WRMR은 mimic 옵션 활용안하면 도출되지 않음.

Introduction to Psychometric Theory

Yoonsik Yang

1 EFA

1.1 EFA 분석 사전 과정(패키지 설치 및 열기, 데이터 열기)

1.1.1 패키지 설치

1.1.2 라이브러리 열기

1.1.3 데이터 열기

1.2 분석 시작

1.2.1 분석의 첫단계: correlation, scatter plot, descriptive statistics

1.2.2 EFA 요인의 수 결정. eigenvalues, scree, parallel test

1.2.2.1 eigenvalues

1.2.2.2 scree

1.2.2.3 parallel test

1.2.3 1요인 EFA 결과

1.2.4 2요인 EFA 결과

1.3 Miscellaneous

2 CFA 추정

2.1 chapter 4.4 (table 4.2)

2.2 discrete components

2.2.1 데이터 및 모형