Fingers

All techniques displayed below, except the actual learning curve model, are explained in my book:

New Statistics for Design Researchers

Reading data

TODO Russell:

  1. Figure out why sum or mean doesn’t matter, when \(n\) is constant.
  2. What is variable h? What we need is a variable to identify the sequence that was learned.

Reading from Excel:

Fingers <- 
  readxl::read_excel("Finger_data/MSL_BehavData_Training_6key_merged.xlsx") %>% 
  select(Study = ExperimentName,
         Part = Subject,
         Block = Block,
         Sequence = h,
         trial = List2.Cycle,
         Proc = `Procedure[SubTrial]`,
         Key = feedback.CRESP,
         Correct = feedback.ACC,
         Keypress = SubTrial,
         RT = feedback.RT) %>% 
  filter(!is.na(RT),
         Proc == "responsprocedure") %>% 
  mutate(Keypress = Keypress - 7,
         RT = RT/1000,
         Correct = as.logical(Correct))

The result is keystroke level data

# View(Fingers)

Fingers %>% 
  sample_n(20)
Study Part Block Sequence trial Proc Key Correct Keypress RT
EEGGoNo_Seq6_ID8_Training_Final 30 5 6 28 responsprocedure j TRUE 6 0.160
EEGGoNo_Seq6_ID3_Training_Final 3 4 19 14 responsprocedure f TRUE 3 0.113
EEGGoNo_Seq6_ID1_Training_Final 24 2 1 46 responsprocedure a TRUE 1 1.564
EEGGoNo_Seq6_ID6_Training_Final 20 3 22 11 responsprocedure l TRUE 1 0.133
EEGGoNo_Seq6_ID8_Training_Final 23 1 7 4 responsprocedure a TRUE 3 0.578
EEGGoNo_Seq6_ID2_Training_Final 10 2 10 45 responsprocedure j TRUE 1 0.342
EEGGoNo_Seq6_ID8_Training_Final 13 5 8 17 responsprocedure l TRUE 2 0.343
EEGGoNo_Seq6_ID6_Training_Final 20 2 22 25 responsprocedure k TRUE 3 0.133
EEGGoNo_Seq6_ID6_Training_Final 21 3 21 3 responsprocedure f TRUE 4 0.097
EEGGoNo_Seq6_ID8_Training_Final 30 1 6 29 responsprocedure l TRUE 5 0.255
EEGGoNo_Seq6_ID1_Training_Final 14 3 1 6 responsprocedure d TRUE 2 1.201
EEGGoNo_Seq6_ID6_Training_Final 16 5 22 33 responsprocedure j TRUE 2 0.032
EEGGoNo_Seq6_ID2_Training_Final 18 5 12 11 responsprocedure j TRUE 6 0.077
EEGGoNo_Seq6_ID8_Training_Final 8 4 7 1 responsprocedure f TRUE 1 0.440
EEGGoNo_Seq6_ID6_Training_Final 6 4 22 5 responsprocedure j TRUE 4 0.086
EEGGoNo_Seq6_ID5_Training_Final 5 3 30 20 responsprocedure l TRUE 3 0.316
EEGGoNo_Seq6_ID2_Training_Final 2 3 9 1 responsprocedure a TRUE 3 0.059
EEGGoNo_Seq6_ID2_Training_Final 25 3 11 11 responsprocedure a TRUE 1 0.396
EEGGoNo_Seq6_ID6_Training_Final 16 3 22 47 responsprocedure j TRUE 2 0.128
EEGGoNo_Seq6_ID6_Training_Final 16 5 24 37 responsprocedure {;} TRUE 4 0.090

We summarize over keypresses and measure trial-level correctness of response

Fingers_1 <-
  Fingers %>% 
  group_by(Study, Part, Block, Sequence, trial) %>% 
  summarize(RT  = sum(RT),
            Acc = all(Correct)) %>% 
  ungroup() %>% 
  group_by(Part, Sequence) %>% 
  mutate(trial = rank(trial)) %>% 
  ungroup()
Fingers_1 %>% 
  sample_n(20)
Study Part Block Sequence trial RT Acc
EEGGoNo_Seq6_ID4_Training_Final 31 1 27 29.0 4.238 FALSE
EEGGoNo_Seq6_ID3_Training_Final 3 5 18 35.0 0.432 TRUE
EEGGoNo_Seq6_ID4_Training_Final 12 1 27 38.0 1.064 TRUE
EEGGoNo_Seq6_ID3_Training_Final 19 3 18 18.0 2.434 TRUE
EEGGoNo_Seq6_ID4_Training_Final 31 4 25 4.0 2.784 TRUE
EEGGoNo_Seq6_ID1_Training_Final 1 5 4 1.0 3.796 TRUE
EEGGoNo_Seq6_ID8_Training_Final 23 5 5 8.5 1.308 TRUE
EEGGoNo_Seq6_ID1_Training_Final 1 3 2 56.0 2.726 FALSE
EEGGoNo_Seq6_ID7_Training_Final 17 2 14 54.0 2.027 TRUE
EEGGoNo_Seq6_ID3_Training_Final 19 3 17 51.5 1.905 TRUE
EEGGoNo_Seq6_ID3_Training_Final 3 1 18 40.0 1.258 TRUE
EEGGoNo_Seq6_ID3_Training_Final 19 4 17 9.5 1.095 TRUE
EEGGoNo_Seq6_ID8_Training_Final 30 2 8 10.5 2.071 TRUE
EEGGoNo_Seq6_ID1_Training_Final 14 4 4 45.0 1.341 TRUE
EEGGoNo_Seq6_ID7_Training_Final 22 1 15 50.5 3.833 FALSE
EEGGoNo_Seq6_ID2_Training_Final 10 4 11 43.0 0.726 TRUE
EEGGoNo_Seq6_ID2_Training_Final 25 2 11 36.0 1.918 TRUE
EEGGoNo_Seq6_ID8_Training_Final 30 4 8 35.5 1.650 TRUE
EEGGoNo_Seq6_ID2_Training_Final 2 3 12 41.5 0.866 TRUE
EEGGoNo_Seq6_ID6_Training_Final 16 5 22 8.0 0.359 TRUE
Fingers_1 %>% 
  group_by(Part, Sequence) %>% 
  summarize(n()) %>% 
  ungroup()
## `summarise()` has grouped output by 'Part'. You can override using the `.groups` argument.
Part Sequence n()
1 1 60
1 2 60
1 3 60
1 4 60
2 9 60
2 10 60
2 11 60
2 12 60
3 17 60
3 18 60
3 19 60
3 20 60
4 25 60
4 26 60
4 27 60
4 28 60
5 29 60
5 30 60
5 31 60
5 32 60
6 21 60
6 22 60
6 23 60
6 24 60
7 13 60
7 14 60
7 15 60
7 16 60
8 5 60
8 6 60
8 7 60
8 8 60
9 1 60
9 2 60
9 3 60
9 4 60
10 9 60
10 10 60
10 11 60
10 12 60
11 17 60
11 18 60
11 19 60
11 20 60
12 25 60
12 26 60
12 27 60
12 28 60
13 5 60
13 6 60
13 7 60
13 8 60
14 1 60
14 2 60
14 3 60
14 4 60
15 29 60
15 30 60
15 31 60
15 32 60
16 21 60
16 22 60
16 23 60
16 24 60
17 13 60
17 14 60
17 15 60
17 16 60
18 9 60
18 10 60
18 11 60
18 12 60
19 17 60
19 18 60
19 19 60
19 20 60
20 21 60
20 22 60
20 23 60
20 24 60
21 21 60
21 22 60
21 23 60
21 24 60
22 13 60
22 14 60
22 15 60
22 16 60
23 5 60
23 6 60
23 7 60
23 8 60
24 1 60
24 2 60
24 3 60
24 4 60
25 9 60
25 10 60
25 11 60
25 12 60
26 25 60
26 26 60
26 27 60
26 28 60
27 29 60
27 30 60
27 31 60
27 32 60
28 17 60
28 18 60
28 19 60
28 20 60
29 21 60
29 22 60
29 23 60
29 24 60
30 5 60
30 6 60
30 7 60
30 8 60
31 25 60
31 26 60
31 27 60
31 28 60

Visual learning curve analysis

Fingers_1 %>% 
  filter(RT < 30) %>% 
  ggplot(aes(x = trial,
             y = RT)) +
  #geom_point() +
  geom_smooth(se = F) +
  facet_wrap(~Part, scales = "free_y")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Fingers_1 %>% 
  filter(RT < 30) %>% 
  ggplot(aes(x = trial,
             y = RT,
             group = Part)) +
  facet_grid(~Block) +
  geom_smooth(se = F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Fingers_1 %>% 
  filter(RT < 30) %>% 
  ggplot(aes(x = trial,
             y = RT,
             group = Part)) +
  geom_smooth(se = F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Non-linear multi-level regression

Three-parametric exponential learning curve in ARY parametrization:

  1. Amplitude: possible amount of learning
  2. Rate: how fast a person learns
  3. asYmptote
F_ary <- formula(RT ~ asym + ampl * exp(-rate * trial))

Random effects. Here we assume that sequences are exchangeable and take enough with a participant-level model. If the sequences differ, e.g. containing different movements, we would need sequence-level random effects, too (1|)

F_ary_ef_1 <- list(formula(ampl ~ 1|Part),
                   formula(rate ~ 1|Part),
                   formula(asym ~ 1|Part))

Weakly informative priors

F_ary_prior <- c(set_prior("normal(5, 100)", nlpar = "ampl", lb = 0),
                 set_prior("normal(.5, 3)", nlpar = "rate", lb = 0),
                 set_prior("normal(3, 20)", nlpar = "asym", lb = 0))

Running the model. The Exgaussian error distribution is suited for reaction times (which are not Normally distributed).

M_1 <- 
  Fingers_1 %>% 
  brm(bf(F_ary,
         flist = F_ary_ef_1,
         nl = T), 
      prior = F_ary_prior,
      family = "exgaussian",
      data = .,
      iter = 10000, 
      warmup = 8000)

P_1 <- posterior(M_1)
PP_1 <- post_pred(M_1, thin = 10)
  
save(M_1, P_1, PP_1, Fingers, Fingers_1, file = "Fingers.Rda")

Population-level effects and random effects SD, as a measure for diversity.

fixef(P_1)
Coefficient estimates with 95% credibility limits
nonlin center lower upper
ampl 0.9393113 0.3644445 1.494470
rate 0.0049802 0.0025703 0.017177
asym 1.5973753 1.1011373 2.181002
grpef(P_1)
Coefficient estimates with 95% credibility limits
nonlin center lower upper
ampl 0.6785735 0.3565348 0.9310675
rate 0.0036989 0.0018803 0.0098698
asym 0.1444691 0.0080067 0.5901953

Participant-level learning parameters:

P_1 %>% 
  re_scores() %>% 
  ranef() %>% 
  sample_n(10)
Coefficient estimates with 95% credibility limits
nonlin re_entity center lower upper
rate 14 0.0081871 0.0033981 0.0258611
ampl 19 0.5487303 0.0433740 1.1281129
rate 9 0.0046142 0.0013326 0.0215241
asym 27 1.6575145 1.1152289 3.2891386
rate 25 0.0048672 -0.0011113 0.0197488
rate 21 0.0074436 0.0023785 0.0210244
asym 24 1.6361577 1.0781867 3.3257178
ampl 7 1.0767288 -0.0280910 1.6525793
ampl 13 1.5684734 0.5882265 2.1380801
ampl 31 1.1868278 0.1036099 1.7598291
P_1 %>% 
  re_scores() %>% 
  ranef() %>% 
  ggplot(aes(x = re_entity, 
             y = center, 
             ymin = lower, 
             ymax = upper)) +
  facet_grid(nonlin~1, scales = "free_y") +
  geom_crossbar(width = .2) +
  labs(x = "Participant", y = "value")

The estimated learning curves based on fitted responses

Fingers_1 %>% 
  mutate(M_1 = predict(PP_1)$center) %>% 
  ggplot(aes(x = trial,
             y = M_1)) +
  facet_wrap(~Part, scales = "free_y") +
  geom_smooth(se = F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Fingers_1 %>% 
  mutate(M_1 = predict(PP_1)$center) %>% 
  filter(Part != 24, Part != 30) %>% 
  ggplot(aes(x = trial,
             y = M_1,
             group = Part)) +
  #facet_wrap(~Part, scales = "free_y") +
  geom_smooth(se = F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Conclusion:

  • Learning curves are visible
  • Outlier participants happen
  • Learning is very slow