This script analyzes data from an experiment testing the effectiveness of different sequences of active vs. passive training in a category learning task.
In this experiment, we only ran the rule-based (easier, 1-D category structure). The hypothesis is that getting a block of receptive data will make participants more effective active learners, boosting their accuracy on test trials.
rm(list=ls())
source("helpers/useful.R")
library(dplyr)
library(magrittr)
df <- read.csv("../data/act-learn-sequence-1.csv", stringsAsFactors = F)
Overall performance (averaged across blocks) was more than three standard deviations below the mean of their group.
ss <- df %>%
filter(trial_type == "test") %>%
group_by(subids, condition) %>%
summarise(mean_accuracy = mean(correct),
ci_high_acc = ci.high(correct),
ci_low_acc = ci.low(correct))
# get the mean and standard deviation of each group
ms_ss <- ss %>%
group_by(condition) %>%
summarise(group_mean = mean(mean_accuracy),
group_sd = sd(mean_accuracy))
Map means and sd to variables
# active rb
mean_act_rb <- filter(ms_ss, condition == "active_receptive")$group_mean
sd_act_rb <- filter(ms_ss, condition == "receptive_active")$group_sd
# receptive rb
mean_rec_rb <- filter(ms_ss, condition == "active_receptive")$group_mean
sd_rec_rb <- filter(ms_ss, condition == "receptive_active")$group_sd
Flag subject means that are +/- 3sd
ss_include <- ss %>%
mutate(
include = ifelse(condition == "active_receptive",
ifelse(mean_accuracy > mean_act_rb + 3 * sd_act_rb |
mean_accuracy < mean_act_rb - 3 * sd_act_rb, "remove", "include"),
ifelse(mean_accuracy > mean_rec_rb + 3 * sd_rec_rb |
mean_accuracy < mean_rec_rb - 3 * sd_rec_rb, "remove", "include"))) %>%
select(subids, include, condition)
ss_include %>%
group_by(condition, include) %>%
summarise(n())
## Source: local data frame [2 x 3]
## Groups: condition [?]
##
## condition include n()
## (chr) (chr) (int)
## 1 active_receptive include 29
## 2 receptive_active include 23
# merge with full data frame
df <- left_join(df, select(ss_include, subids, include, condition), by = c("subids", "condition"))
# filter out subjects that are +/- 3sd
df <- df %>% filter(include == "include")
## Warning in left_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
## Warning in left_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
| condition | order | mean_exp_length | sd_exp_length | category_type | count |
|---|---|---|---|---|---|
| active_receptive | order1 | 7.680843 | 5.158424 | rule-based | 14 |
| active_receptive | order2 | 7.439557 | 5.093389 | rule-based | 15 |
| receptive_active | order1 | 6.060428 | 2.023470 | rule-based | 18 |
| receptive_active | order2 | 4.983190 | 1.555280 | rule-based | 5 |
Histogram of length of experiment split by condition
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
df_test_rt <- df %>%
filter(trial_type == "test") %>%
filter(rt > 0) %>%
mutate(include_good_rt_test = ifelse(log(rt) > mean(log(rt)) +
2 * sd(log(rt)) |
log(rt) < mean(log(rt)) -
2 * sd(log(rt)),
"exclude", "include")) %>%
select(subids, trial_number, include_good_rt_test)
df %<>% left_join(y = df_test_rt, by = c("subids", "trial_number"))
Get mean accuracy for each condition and category type
Plot.
Overall, receptive-first learners are numerically more accurate, but the difference is not reliable.
Next, we analyze accuracy across the two blocks.
Plot.
The block analysis suggests some effect of order on active learning. Receptive-first learners appear to be more accurate after their block of active learning (block 2) compared to Active-first (block 1).
But, the effect is not large enough to overcome the overall active advantage that shows up in block 1 for the Active-first learners.
Order here refers to whether size or angle was the category dimension.
Plot accuracy over blocks
Order 1 (Size) looks like what we predicted: A small boost for Receptive-first learners.
In Order 2 (Angle), there is a bigger difference between active and passive learning in the first block, with receptive learning being the hardest. But, we have the least amount of data for this condition (only 5 participants in Receptive-first, Order 2 condition).
Analyze the average distance of participants’ samples from the optimal decision boundary.
Rotate, so orientation and radius are on the same dimension.
Plot group level sampling behavior.
qplot(x=dim_of_interest, y=other_dim, data = df_sampling,
facets=.~condition) +
scale_color_brewer(type = "qual", palette = "Set1") +
geom_vline(xintercept = 300, color = "blue")
Plot individual participant sampling behavior
Get distance from optimal decision boundary for each sample.
df_sampling <- mutate(df_sampling,
samp_dist_odb = ifelse(odb_scale == "orientation_scale",
abs(odb_param - orientation_response_param),
abs(odb_param - radius_response_param)))
Now get the average distance across subjects
ms_sampling <- df_sampling %>%
group_by(condition) %>%
summarize(mean_samp_dist = mean(samp_dist_odb),
ci_high = ci.high(samp_dist_odb),
ci_low = ci.low(samp_dist_odb))
Plot.
qplot(x=condition, y=mean_samp_dist, data=ms_sampling,
geom="bar", stat="identity", fill = condition, position = "dodge") +
geom_linerange(aes(ymin=mean_samp_dist - ci_low,
ymax=mean_samp_dist + ci_high),
width = .05, size=0.6, position=position_dodge(width=0.9)) +
scale_fill_brewer(type = "qual", palette = "Set1") +
coord_cartesian(ylim=c(100, 190)) +
ylab("Mean Sample Distance") +
xlab("Condition") +
theme_classic() +
theme(text = element_text(size=16)) +
theme_classic()
Active learning is better after getting a block of receptive learning trials. Some evidence that getting receptive-first boosted active learning.
Get the mean sample distance and accuracy for each participant.
ss_samp_dist <- df_sampling %>%
filter(trial_training_block == "active") %>%
group_by(subids, condition) %>%
summarise(mean_samp_dist = mean(samp_dist_odb),
ci_high_msd = ci.high(samp_dist_odb),
ci_low_msd = ci.low(samp_dist_odb))
ss_mean_acc <- df %>%
filter(trial_training_block == "active", trial_type == "test", include_good_rt_test == "include") %>%
group_by(subids, condition, block) %>%
summarise(mean_accuracy = mean(correct),
ci_high_acc = ci.high(correct),
ci_low_acc = ci.low(correct))
# join sampling and test acc together
ss_all <- inner_join(ss_samp_dist, ss_mean_acc, by=c("subids", "condition"))
Plot
qplot(x=mean_samp_dist, y=mean_accuracy, data=ss_all) +
geom_smooth(method="lm") +
facet_grid(~condition) +
xlab("Mean Sample Distance") +
ylab("Mean Accuracy") +
theme(text = element_text(size=16))
ss_mean_acc_explore <- df %>%
filter(trial_type == "test", include_good_rt_test == "include") %>%
group_by(subids, condition, block, order) %>%
summarise(mean_accuracy = mean(correct),
ci_high_acc = ci.high(correct),
ci_low_acc = ci.low(correct))
Plot.
qplot(x=as.factor(block), y=mean_accuracy, data=ss_mean_acc_explore,
color = as.factor(subids), group = as.factor(subids), shape = order) +
geom_line() +
geom_smooth(aes(group=1), method = "lm", se = F, color = "red", size = 2) +
facet_grid(.~condition) +
xlab("Block") +
ylab("Mean Accuracy") +
theme(text = element_text(size=16)) +
guides(color = F)
Not totally sure what to make of this. But there is a different overall pattern of accuracy performance across blocks. Active-first learners have smaller slope compared to Receptive-first learners.
There are a couple of participants doing weird things – huge drop in accuracy in block 2, but show up in both conditions.
Maybe there are some other analyses to do at the individual participant level?
t.test testing difference in accuracy on test trials between active first and receptive first learning conditions.
t.test(mean_accuracy ~ condition, var.equal=TRUE, data=ss_all)
##
## Two Sample t-test
##
## data: mean_accuracy by condition
## t = -1.5475, df = 50, p-value = 0.1281
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.21538803 0.02792741
## sample estimates:
## mean in group active_receptive mean in group receptive_active
## 0.7328625 0.8265928
Overall accuracy scores of the two groups are not different.
Does condition and block predict accuracy on test trials?
m1 <- glmer(correct ~ condition * block + (1|subids),
data=filter(df, trial_type=="test"), family=binomial)
summary(m1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ condition * block + (1 | subids)
## Data: filter(df, trial_type == "test")
##
## AIC BIC logLik deviance df.resid
## 3326.7 3357.2 -1658.3 3316.7 3323
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8577 -0.6757 0.3489 0.5667 1.5153
##
## Random effects:
## Groups Name Variance Std.Dev.
## subids (Intercept) 1.216 1.103
## Number of obs: 3328, groups: subids, 52
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.23635 0.27644 4.472 7.73e-06 ***
## conditionreceptive_active -1.21614 0.41183 -2.953 0.00315 **
## block 0.04731 0.11554 0.409 0.68222
## conditionreceptive_active:block 0.86466 0.17747 4.872 1.10e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtn_ block
## cndtnrcptv_ -0.671
## block -0.624 0.419
## cndtnrcpt_: 0.408 -0.619 -0.651
Reliable interaction between condition and block. Receptive-first learners perform better on the second block of test trials than Active-first learners.
But overall, the two groups are not different from one another. How to interpret?
Does mean accuracy depend on sampling behavior and condition?
m2 <- lm(mean_accuracy ~ mean_samp_dist * condition, data=ss_all)
summary(m2)
##
## Call:
## lm(formula = mean_accuracy ~ mean_samp_dist * condition, data = ss_all)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.66466 -0.06052 0.05570 0.14564 0.30730
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.227574 0.218106 1.043
## mean_samp_dist 0.002867 0.001218 2.354
## conditionreceptive_active 0.801232 0.287375 2.788
## mean_samp_dist:conditionreceptive_active -0.004209 0.001716 -2.453
## Pr(>|t|)
## (Intercept) 0.30199
## mean_samp_dist 0.02273 *
## conditionreceptive_active 0.00758 **
## mean_samp_dist:conditionreceptive_active 0.01784 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2073 on 48 degrees of freedom
## Multiple R-squared: 0.1637, Adjusted R-squared: 0.1114
## F-statistic: 3.132 on 3 and 48 DF, p-value: 0.03403
Reliable interaction between mean sample distance and condition. If you get Receptive-first, then better sampling predicts better test, but not if you get Active-first.
Which condition is “better” at sampling?
m3_samp <- lmer(samp_dist_odb ~ condition + (1|subids), data=df_sampling)
summary(m3_samp)
## Linear mixed model fit by REML ['lmerMod']
## Formula: samp_dist_odb ~ condition + (1 | subids)
## Data: df_sampling
##
## REML criterion at convergence: 10010.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.09819 -0.87023 -0.04171 0.95561 1.74905
##
## Random effects:
## Groups Name Variance Std.Dev.
## subids (Intercept) 568.1 23.84
## Residual 9586.5 97.91
## Number of obs: 832, groups: subids, 52
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 176.216 6.344 27.775
## conditionreceptive_active -25.539 9.540 -2.677
##
## Correlation of Fixed Effects:
## (Intr)
## cndtnrcptv_ -0.665
Receptive-first participants are better at sampling than active first participants.