EFA

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(readr)
data <- read_csv("~/Desktop/Qian Thesis 13 features.csv")

## Rows: 1419 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): story
## dbl (13): avg_sentence_length, coherence, consistency, distinct_2, lm_ppl, t...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(data)

## spc_tbl_ [1,419 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ story               : chr [1:1419] "Pam had a jam in a pan. The jam was pink. Tim had a cat. The cat had a bat. Tim and Pam sat. They sat with Matt"| __truncated__ "Pat the cat sat on a mat. A bat flew and sat on Pat. Pat the cat did not like the bat. Pat swat at the bat. The"| __truncated__ "Pam has a pet cat. The cat's name is Pat. Pat is a fat cat. Pam and Pat play in the sun. Pat can nap on a mat. "| __truncated__ "Pam the cat has a hat. Pam's hat is tan and fits just right. She naps with her hat on a mat. Matt the rat is Pa"| __truncated__ ...
##  $ avg_sentence_length : num [1:1419] 6.33 8.62 6.93 8.23 8.94 ...
##  $ coherence           : num [1:1419] 0.0909 0.625 0.1429 0.1538 0 ...
##  $ consistency         : num [1:1419] 0.371 0.417 0.371 0.421 0.216 ...
##  $ distinct_2          : num [1:1419] 0.84 0.809 0.864 0.887 0.869 ...
##  $ lm_ppl              : num [1:1419] 25.7 25.1 20.8 38.7 28.5 ...
##  $ text_novelty        : num [1:1419] 0.0267 0.1176 0.1068 0.2453 0.2188 ...
##  $ flesch              : num [1:1419] 135 122 129 123 130 ...
##  $ spache              : num [1:1419] 1.53 1.91 1.72 1.92 2.05 ...
##  $ toxicity            : num [1:1419] 0.0561 0.0122 0.0272 0.0339 0.014 ...
##  $ syntactic_complexity: num [1:1419] 2.05 2.44 2.25 2.04 3.43 ...
##  $ self_bleu           : num [1:1419] 0.041182 0.062214 0.000638 0.0005 0.000951 ...
##  $ factual_hint        : num [1:1419] 0.615 0.658 0.701 0.673 0.797 ...
##  $ semantic_drift      : num [1:1419] 0.66 0.554 0.68 0.622 0.733 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   story = col_character(),
##   ..   avg_sentence_length = col_double(),
##   ..   coherence = col_double(),
##   ..   consistency = col_double(),
##   ..   distinct_2 = col_double(),
##   ..   lm_ppl = col_double(),
##   ..   text_novelty = col_double(),
##   ..   flesch = col_double(),
##   ..   spache = col_double(),
##   ..   toxicity = col_double(),
##   ..   syntactic_complexity = col_double(),
##   ..   self_bleu = col_double(),
##   ..   factual_hint = col_double(),
##   ..   semantic_drift = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

scaled_data <- data %>%
  select(-story) %>%   
  scale()              

library(psych)

## Warning: package 'psych' was built under R version 4.3.3

fa.parallel(scaled_data, fa = "fa", n.iter = 100, show.legend = FALSE)

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA

efa_result <- fa(scaled_data, nfactors = 3, rotate = "oblimin", fm = "pa")

## Loading required namespace: GPArotation

print(efa_result$loadings)

## 
## Loadings:
##                      PA1    PA2    PA3   
## avg_sentence_length   0.867 -0.103       
## coherence                   -0.346 -0.120
## consistency                 -0.967       
## distinct_2            0.536  0.438  0.148
## lm_ppl                       0.193  0.447
## text_novelty          0.171         0.665
## flesch               -0.740        -0.292
## spache                0.776         0.264
## toxicity             -0.306 -0.163       
## syntactic_complexity  0.802        -0.261
## self_bleu            -0.218 -0.289  0.118
## factual_hint                 0.927       
## semantic_drift               0.887       
## 
##                  PA1   PA2   PA3
## SS loadings    3.020 3.060 0.929
## Proportion Var 0.232 0.235 0.071
## Cumulative Var 0.232 0.468 0.539

print(efa_result)

## Factor Analysis using method =  pa
## Call: fa(r = scaled_data, nfactors = 3, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                        PA1   PA2   PA3   h2    u2 com
## avg_sentence_length   0.87 -0.10 -0.06 0.72 0.277 1.0
## coherence             0.05 -0.35 -0.12 0.13 0.867 1.3
## consistency          -0.02 -0.97  0.00 0.94 0.064 1.0
## distinct_2            0.54  0.44  0.15 0.56 0.436 2.1
## lm_ppl               -0.08  0.19  0.45 0.22 0.784 1.4
## text_novelty          0.17 -0.08  0.66 0.57 0.428 1.2
## flesch               -0.74  0.03 -0.29 0.81 0.186 1.3
## spache                0.78 -0.01  0.26 0.84 0.158 1.2
## toxicity             -0.31 -0.16  0.09 0.10 0.896 1.7
## syntactic_complexity  0.80  0.02 -0.26 0.54 0.464 1.2
## self_bleu            -0.22 -0.29  0.12 0.12 0.879 2.2
## factual_hint          0.01  0.93 -0.03 0.86 0.141 1.0
## semantic_drift       -0.10  0.89  0.00 0.80 0.201 1.0
## 
##                        PA1  PA2  PA3
## SS loadings           3.12 3.06 1.04
## Proportion Var        0.24 0.24 0.08
## Cumulative Var        0.24 0.48 0.56
## Proportion Explained  0.43 0.42 0.14
## Cumulative Proportion 0.43 0.86 1.00
## 
##  With factor correlations of 
##       PA1   PA2  PA3
## PA1  1.00 -0.01 0.42
## PA2 -0.01  1.00 0.01
## PA3  0.42  0.01 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  78  with the objective function =  8.58 with Chi Square =  12115.08
## df of  the model are 42  and the objective function was  1.14 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic n.obs is  1419 with the empirical chi square  483.65  with prob <  2e-76 
## The total n.obs was  1419  with Likelihood Chi Square =  1608.08  with prob <  3.5e-310 
## 
## Tucker Lewis Index of factoring reliability =  0.758
## RMSEA index =  0.162  and the 90 % confidence intervals are  0.155 0.169
## BIC =  1303.26
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    PA1  PA2  PA3
## Correlation of (regression) scores with factors   0.96 0.98 0.84
## Multiple R square of scores with factors          0.92 0.97 0.70
## Minimum correlation of possible factor scores     0.84 0.94 0.40

cor.plot(efa_result$loadings, numbers=TRUE, main="Factor Loadings")

EFA

2025-07-01

R Markdown

Including Plots