Behavioural stability of writing process measures

Principal component analysis

Load data:

data <- read_csv("../data/by_ppt_means_wide.csv") %>% 
  select_if(~ !any(is.na(.))) # remove variables that contains NAs

Run principal component analysis:

# Principal component analysis
pc <- prcomp(dplyr::select(data, where(is.double)), center = TRUE, scale = TRUE)

Scree plot suggests 6 or 7 PCs.

These are the strongest loadings for the first three PCs. Variables that correlated with a PC less than \(|<.2|\) were omitted.

dv	pc	loading
is_lookback	PC1	-0.34
edit	PC1	-0.31
is_lookback_post-word_production	PC1	-0.30
is_lookback_within-word_production	PC1	-0.30
productionseq	PC1	0.30
edit_post-word	PC1	-0.30
edit_within-word	PC1	-0.29
nonlookbackseq	PC1	0.27
n_edges	PC1	-0.21
n_jumps	PC1	-0.20
log_totalfix_duration	PC2	-0.39
n_fixes	PC2	-0.38
log_totalfix_duration_post-word_production	PC2	-0.36
log_totalfix_duration_within-word_production	PC2	-0.35
n_fixes_post-word_production	PC2	-0.30
n_fixes_within-word_production	PC2	-0.29
log_totalfix_duration_post-sentence_production	PC2	-0.26
n_fixes_post-sentence_production	PC2	-0.24
log_event_duration_within-word_production	PC3	0.32
edit_within-word	PC3	-0.31
log_event_duration	PC3	0.31
edit	PC3	-0.31
log_event_duration_within-word_deletion	PC3	0.26
n_edges	PC3	-0.24
log_event_duration_post-word_production	PC3	0.24
is_lookback_within-word_production	PC3	0.23
edit_post-word	PC3	-0.22
n_jumps	PC3	-0.22

Loadings arranged by PC with weak ones omitted. Variables are ordered starting with the one that exhibit the strongest loadings. Negative correlations indicated in red and positive correlations indicated in blue.

PC1	PC2	PC3
is_lookback	log_totalfix_duration	log_event_duration_within-word_production
edit	n_fixes	edit_within-word
is_lookback_post-word_production	log_totalfix_duration_post-word_production	log_event_duration
is_lookback_within-word_production	log_totalfix_duration_within-word_production	edit
productionseq	n_fixes_post-word_production	log_event_duration_within-word_deletion
edit_post-word	n_fixes_within-word_production	n_edges
edit_within-word	log_totalfix_duration_post-sentence_production	log_event_duration_post-word_production
nonlookbackseq	n_fixes_post-sentence_production	is_lookback_within-word_production
n_edges	NA	edit_post-word
n_jumps	NA	n_jumps

Clustering

Hartingan’s Rule suggests 4-5 clusters.

Gap statistic suggests 2 clusters.

Create clusters using \(K\)-means.

k3 <- kmeans(pcs, centers = 3, nstart = 25)

3D cluster plots

Click on plot to rotate it and zoom in and out. Every sphere represents one participant. Colour indicates cluster.

Three clusters solution:

By-cluster means

Means of principal components by cluster.

# A tibble: 3 × 4
  pc    `Cluster 1` `Cluster 2` `Cluster 3`
  <chr>       <dbl>       <dbl>       <dbl>
1 PC1        -0.467      -1.28        4.18 
2 PC2        -1.08        2.36       -0.361
3 PC3         0.552      -0.766      -0.641

Simplified: rank for cluster score on each of the three principal components. Then create a table where maximum score is shown and \(+\), minimum is shown a \(-\) and the middle one is left blank.

	Cluster 1	Cluster 2	Cluster 3
PC1		\(-\)	\(+\)
PC2	\(-\)	\(+\)
PC3	\(+\)	\(-\)

Ppts that changed cluster

Out of 30 ppts, which ppts changed cluster from task 1 to task 2:

[1] "F19-0038" "F20-0013" "F21-0034" "F22-0043" "M19-0006"

Cluster changes by participant

Cluster membership changes by task

Cluster membership for first task and for second task showing how many switched participants swiched cluster, and to which cluster.