一、資料整理

dta_sleep <- subset(dta,select=c(Sleep1,Sleep2,Sleep3,Sleep4,Sleep5))
head(dta_sleep)
##   Sleep1 Sleep2 Sleep3 Sleep4 Sleep5
## 1      1      2      0      1      2
## 2      2      2      1      0      1
## 3      1      0      2      1      1
## 4      2      1      2      2      2
## 5      1      1      0      1      1
## 6      1      1      2      0      1

二、探索性因素分析(Exploratory Factor Analysis, EFA)

library(pacman)
p_load("psych")

使用psych package,

利用KMO及bartlett檢定判斷資料是否適合進行因數分析,KMO皆大於0.6以上可接受,bartlett p<0.01室合作因數分析

KMO(dta_sleep)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = dta_sleep)
## Overall MSA =  0.65
## MSA for each item = 
## Sleep1 Sleep2 Sleep3 Sleep4 Sleep5 
##   0.63   0.71   0.61   0.66   0.64
cortest.bartlett(dta_sleep)
## R was not square, finding R from data
## $chisq
## [1] 296.117
## 
## $p.value
## [1] 1.028829e-57
## 
## $df
## [1] 10

先嘗試一個因子, 說明: R square 解釋例為61%,h2為每個變量的解釋例,sleep1(satisfaction)解釋力最高37%,u2變異數無法被解釋的比例,sleep3(timing,半夜2~4點睡眠時間超過一半),無法被解釋的部分有高達91%。似乎timing不是很好的預測題目。

print.psych(fa(dta_sleep,fm="pa",nfactor=1,rotate="varimax"),cut = .3)#fm因素萃取法--vaiirmax最大變異法,nfacto提取因子數,rotate轉軸法
## Factor Analysis using method =  pa
## Call: fa(r = dta_sleep, nfactors = 1, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         PA1    h2   u2 com
## Sleep1 0.61 0.378 0.62   1
## Sleep2 0.38 0.141 0.86   1
## Sleep3      0.087 0.91   1
## Sleep4 0.48 0.234 0.77   1
## Sleep5 0.55 0.299 0.70   1
## 
##                 PA1
## SS loadings    1.14
## Proportion Var 0.23
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## df null model =  10  with the objective function =  0.46 with Chi Square =  296.12
## df of  the model are 5  and the objective function was  0.07 
## 
## The root mean square of the residuals (RMSR) is  0.07 
## The df corrected root mean square of the residuals is  0.1 
## 
## The harmonic n.obs is  647 with the empirical chi square  59.73  with prob <  1.4e-11 
## The total n.obs was  647  with Likelihood Chi Square =  47.39  with prob <  4.7e-09 
## 
## Tucker Lewis Index of factoring reliability =  0.703
## RMSEA index =  0.114  and the 90 % confidence intervals are  0.086 0.145
## BIC =  15.03
## Fit based upon off diagonal values = 0.91
## Measures of factor score adequacy             
##                                                    PA1
## Correlation of (regression) scores with factors   0.78
## Multiple R square of scores with factors          0.61
## Minimum correlation of possible factor scores     0.23

嘗試2個因子 說明: sleep2(Alertness)被剃除,sleep3(timing)解釋力最低。

print.psych(fa(dta_sleep,fm="pa",nfactor=2,rotate="varimax"),cut = .3)
## maximum iteration exceeded
## Factor Analysis using method =  pa
## Call: fa(r = dta_sleep, nfactors = 2, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##         PA1  PA2   h2   u2 com
## Sleep1 0.82      0.69 0.31 1.0
## Sleep2           0.12 0.88 1.8
## Sleep3      0.36 0.14 0.86 1.1
## Sleep4      0.76 0.61 0.39 1.1
## Sleep5 0.48      0.26 0.74 1.3
## 
##                        PA1  PA2
## SS loadings           1.03 0.78
## Proportion Var        0.21 0.16
## Cumulative Var        0.21 0.36
## Proportion Explained  0.57 0.43
## Cumulative Proportion 0.57 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  10  with the objective function =  0.46 with Chi Square =  296.12
## df of  the model are 1  and the objective function was  0.01 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic n.obs is  647 with the empirical chi square  3.98  with prob <  0.046 
## The total n.obs was  647  with Likelihood Chi Square =  3.43  with prob <  0.064 
## 
## Tucker Lewis Index of factoring reliability =  0.915
## RMSEA index =  0.061  and the 90 % confidence intervals are  0 0.138
## BIC =  -3.04
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    PA1  PA2
## Correlation of (regression) scores with factors   0.84 0.78
## Multiple R square of scores with factors          0.71 0.60
## Minimum correlation of possible factor scores     0.41 0.21

平行分析,可用來判斷提取的因數,圖形上顯示建議1個因子(真實數據的特徵值大於隨機數據矩陣相應的平均特徵值)。

fa.parallel(dta_sleep, fa = "pc", show.legend = TRUE, n.iter = 100)

## Parallel analysis suggests that the number of factors =  NA  and the number of components =  1

三、驗證性因素分析(Confirmatory Factor Analysis, CFA)

使用lavaan package, 参考文献:Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/

library(pacman)
p_load("lavaan", "semPlot")
sleep.model<- 'sleep=~Sleep1+Sleep2+Sleep3+Sleep4+Sleep5'
fit <- cfa(sleep.model, data=dta_sleep)

`

結果判讀,CFI為0.856未大於0.9,SRMR0.057有小於0.08可接受。

sleep.model<- 'sleep=~Sleep1+Sleep2+Sleep3+Sleep4+Sleep5'
cfafit = cfa(model = sleep.model,data = dta_sleep)
summary(cfafit,
        fit.measures = T, 
        standardized = T) 
## lavaan 0.6.15 ended normally after 32 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        10
## 
##   Number of observations                           647
## 
## Model Test User Model:
##                                                       
##   Test statistic                                46.388
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               297.728
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.856
##   Tucker-Lewis Index (TLI)                       0.712
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3294.993
##   Loglikelihood unrestricted model (H1)      -3271.799
##                                                       
##   Akaike (AIC)                                6609.985
##   Bayesian (BIC)                              6654.709
##   Sample-size adjusted Bayesian (SABIC)       6622.959
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.113
##   90 Percent confidence interval - lower         0.085
##   90 Percent confidence interval - upper         0.144
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    0.971
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.057
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   sleep =~                                                              
##     Sleep1            1.000                               0.403    0.646
##     Sleep2            0.611    0.095    6.416    0.000    0.246    0.371
##     Sleep3            0.507    0.102    4.964    0.000    0.204    0.267
##     Sleep4            0.870    0.122    7.159    0.000    0.351    0.439
##     Sleep5            0.919    0.118    7.811    0.000    0.371    0.574
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Sleep1            0.227    0.023    9.759    0.000    0.227    0.583
##    .Sleep2            0.381    0.023   16.228    0.000    0.381    0.863
##    .Sleep3            0.544    0.032   17.152    0.000    0.544    0.929
##    .Sleep4            0.516    0.034   15.304    0.000    0.516    0.807
##    .Sleep5            0.280    0.023   12.118    0.000    0.280    0.671
##     sleep             0.163    0.026    6.173    0.000    1.000    1.000
dev.new() 
semPaths(cfafit,what = "std", #顯示標準化的估計值,顯示原始估計值則 what = "par"
         rotation = 2, #將潛變量置左側,顯變量observable variable置于右
         edge.color = "black",
         esize = 0.5, #箭頭的粗细
         edge.label.cex = 1, #所有值的字號
         exoVar = F ) #不顯示外生變量的變異數

結論:

整體而言,我們的資料顯示適合一個因子,做CEA的結果勉強可符合。 路徑圖看來圖中每個箭頭上的數字是標準化的係數,係數越大則線條和字的顏色越深,sleep3(Timing)和sleep4(Efficiency)殘差變異數接近透明,比較好的預測題目是Sleep1(Satisfaction)及sleep5(Duration )。