This is an initial analysis of the data set which contains within-participant data on (i) accuracy and rate of reading comprehension and (ii) accuracy of listening comprehension of oral texts recorded at different speeds.
Participants were recruited for the study via the crowdsourcing platform Prolific (prolific.co). The following selection criteria were defined using the Prolific settings: participants were between ages 18 and 30; born and residing in Canada or the USA; with no reported language, hearing or vision difficulties; and English monolinguals. As described in detail in the Procedure below, participants were initially invited to complete the reading task. They were compensated by 3.75 GBP for their time. Those who completed this task were further invited to the listening comprehension task: the compensation was an additional 3.75 GBP.
A total of 216 participants completed the reading task and 165 of these participants also completed the listening comprehension task. After excluding individuals with incomplete responses as well as individuals who indicated in the demographic quesitonnaire that they learned English after the age of 5 or had a reading impairment, the resulting pool of participants contained 152 participants. The mean age of the participants was 25.80 (SD = 3.25), with 70 participants indicating female, 79 male, and 2 other gender, and 1 decline to response.
All stimuli came from the Lecture, Interviews and Spoken Narratives (LISN) test, a listening comprehension test, developed by Sommer, Tye-Murray, and colleagues (Sommers et al., 2007; 2011; Tye-Murray 2008). The LISN test is composed of 3 different types of recorded audio passages (lecture, interviews and narratives). Our prior study showed that narratives are a more reliable and easier passage type: the present study only used narratives.
The narratives were selected from the Rutgers University Oral History Archives of personal descriptions of life experiences (Sommer et al., 2011). The passages were between 3-5 minutes long (542–661 words per passage, median = 616 words) and were recorded by male and female professional actors with North American accents. This study used 12 out of 16 narrative passages. Six passages were presented to a given participant in the written format for reading comprehension and another six orally, at different speeds (see below) for listening comprehension.
Each passage was followed by 6 comprehension questions. The questions are in the multiple choice format with 4 options per question. There are three types of comprehension questions: information questions, integration questions and inference questions. Information questions ask participants to recall a specific piece of information from the text. Integration questions assess if the participants are able to combine multiple bits of information learned from the text. Finally, inference questions ask participants to derive implications from the information in the passages (Sommer et al., 2011). Two of each type of questions were asked for a total of 6 questions per passage. By the end of each set of 6 passages for reading and listening, participants would have answered 36 comprehension questions, or 72 questions in total. The individual scores were calculated separately for reading and listening as a percentage of questions they answered correctly out of 36.
Audio recordings of texts were manipulated using computer software package PRAAT [REF] to represent a range of speech rates. Speech rate was first calculated for the original recordings of the 12 narrative passages from the LISN test as the total number of words in the passage divided by the duration of its recording, in words per minute (wpm). The average speech rate was 180 wpm, comparable to a typical speech rate of an audio book [REF]. We adjusted the duration of each recording through compression or lenghtening to ensure the 180 wpm speech rate in the baseline condition. Further, each recording was altered for duration to obtain speech rates equal to 125% (225 wpm), 150% (270 wpm), 175% (315 wpm), 200% (360 wpm), and 225% (405 wpm) of the baseline speech rate. The range of speech rates was chosen to represent both careful speech and comfortable for listening comprehension and very rapid speech, which hinders comprehension.
The present experimental study was implemented online as two web-based experiments – for the reading and listening comprehension tasks, respectively – with a shared SQL database storing responses. Participants were recruited via the crowdsourcing platform Prolific (prolific.co) and were initially invited to complete the reading task on the crowdsourcing platform Prolific. This task took an average of roughly 20 minutes. Upon completion, same participants were further invited to complete a listening comprehension task. The listening comprehension task took an average of about 30 minutes. A break of at least one hour was observed between the tasks to minimize fatigue and potential spillover effects.
If a participant chose to complete both tasks, they were exposed to a total of 12 passages. The assignment of passages to the reading versus listening task (six each) and the assignment of passages in the listening task to six different speech rates was randomized for each participant. The reading task always came first, followed by the listening task. The order of speech rate conditions within the listening task was randomized.
Need to fill in details about consent forms, questionnaires, audio check, etc
Reading Each passage presented for reading was separated into 5-6 chunks equivalent to one or two paragraphs of the original text. Chunks appeared on the screen one at a time and participants pressed space to move to the next chunk. When the passage was read, six comprehension questions with 4 multiple choice options appeared on the screen one at a time. Participants responded by pressing a, b, c, or d. Reading times per chunk and response keys and latencies were recorded. After the last question to a passage was completed, the next passage appeared.
Listening In the beginning of this task, participants were asked to use their headphones. An audio-verification task was administered before stimulus presentation. An audio file with a test phrase was played and participants were requested to type that phrase on the screen. This step provided a chance to adjust audio settings of the participant’s computers or headphones. As the next step, audio recordings of passages were played one at a time. Participants did not have the option to pause or stop the recording. As in the reading task, after each recording, six comprehension questions appeared on the screen in the written format. Participants responded by pressing keys a, b, c, or d. Response keys and latencies were recorded.
This is where the coherent narrative ends and a spotty reporting begins.
Load up libraries and primary data.
library(tidyverse)
library(ggplot2)
library(effects)
library(lme4)
load("data.rda")
Accuracy from both the reading task and listening task, reported jointly.
responses <- left_join(responses, questions) %>% filter(user_id %in% participants$user_id, user_id >= 9)
Joining, by = c("passage", "question_num", "correct_response")
good <- names(which(table(responses$user_id) == 72)) #only use participants with both reading an listening
responses <- responses %>% filter(user_id %in% good)
length(unique(responses$user_id))
[1] 153
responses$treatment = droplevels(responses$treatment)
responses$treatment = relevel(responses$treatment, ref = "reading")
responses$score <- ifelse(responses$response == tolower(responses$correct_response), 1, 0)
table(responses$score)/nrow(responses)
0 1
0.3868918 0.5991285
responses$rt <- (responses$response_end - responses$response_start)/1000
#these are accuracy scores per subject, treatent and passage (reading/list together)
scores_passage <- responses %>% group_by(user_id, treatment, passage) %>%
filter(is.na(score)==F) %>% summarise(sum_score = sum(score, na.rm = T),
n = NROW(score)) %>% mutate(mean_score = sum_score/n)
scores_read <- responses %>% filter(treatment == "reading") %>% group_by(user_id, treatment) %>%
filter(is.na(score)==F) %>% summarise(sum_score = sum(score, na.rm = T),
n = NROW(score)) %>% mutate(mean_score = sum_score/n)
#these are accuracy scores per subject and treatment
scores_all <- responses %>% group_by(user_id, treatment) %>%
filter(is.na(score)==F) %>% summarise(sum_score = sum(score, na.rm = T),
n = NROW(score)) %>% mutate(mean_score = sum_score/n)
#these are RTs to comprehension questions per subject and treatment *NOT USED SO FAR*
rt_all <- responses %>% group_by(user_id, treatment) %>% filter(is.na(score)==F, is.na(rt) ==F) %>%
summarise(median_rt = median(rt), n_rt = NROW(rt))
#removing participants who close to chance (30% vs 25 chance)
#prop.test(0.37 * 36, 36, 0.25, alternative = "greater")
bad <- scores_read[which(scores_read$mean_score <= 0.3),]$user_id
scores_all <- scores_all %>% filter(!user_id %in% bad)
rt_all <- rt_all %>% filter(!user_id %in% bad)
#participants descriptives (reported in Participants)
user_good <- rt_all$user_id
participants %>% filter(user_id %in% user_good) %>% summarise(mean(as.numeric(as.character(age))), sd(as.numeric(as.character(age))))
mean(as.numeric(as.character(age)))
1 25.79605
sd(as.numeric(as.character(age)))
1 3.249735
participants %>% filter(user_id %in% user_good) -> x
table(x$gender)
abstain female male NULL other
1 70 79 0 2
responses$treatment = paste0("sp", as.character(responses$treatment))
responses[responses$treatment == "spreading",]$treatment = "reading"
responses$treatment = as.factor(responses$treatment)
Now to the models and plotting of accuracy. The LMER is set up for backward differences between levels of treatment (reading, speech 180, speech 225 etc.). Treatment1 is a difference in accuracy between sp180 minus reading, Treatment2 is sp225 minus sp180 etc.
my.backward.diff = matrix(c(-1/7, 6/7, 6/7, 6/7, 6/7, 6/7, 6/7,
-2/7, -2/7, 5/7, 5/7, 5/7, 5/7, 5/7,
-3/7, -3/7, -3/7, 4/7, 4/7, 4/7, 4/7,
-4/7, -4/7, -4/7, -4/7, 3/7, 3/7, 3/7,
-5/7, -5/7, -5/7, -5/7, -5/7, 2/7, 2/7,
-6/7, -6/7, -6/7, -6/7, -6/7, -6/7, 1/7), ncol = 6)
# my.backward.diff = matrix(c(-1/6, 5/6, 5/6, 5/6, 5/6, 5/6,
# -2/6, -2/6, 4/6, 4/6, 4/6, 4/6,
# -3/6, -3/6, -3/6, 3/6, 3/6, 3/6,
# -4/6, -4/6, -4/6, -4/6, 2/6, 2/6,
# -5/6, -5/6, -5/6, -5/6, -5/6, 1/6), ncol = 5)
my.backward.diff
[,1] [,2] [,3] [,4] [,5]
[1,] -0.1428571 -0.2857143 -0.4285714 -0.5714286 -0.7142857
[2,] 0.8571429 -0.2857143 -0.4285714 -0.5714286 -0.7142857
[3,] 0.8571429 0.7142857 -0.4285714 -0.5714286 -0.7142857
[4,] 0.8571429 0.7142857 0.5714286 -0.5714286 -0.7142857
[5,] 0.8571429 0.7142857 0.5714286 0.4285714 -0.7142857
[6,] 0.8571429 0.7142857 0.5714286 0.4285714 0.2857143
[7,] 0.8571429 0.7142857 0.5714286 0.4285714 0.2857143
[,6]
[1,] -0.8571429
[2,] -0.8571429
[3,] -0.8571429
[4,] -0.8571429
[5,] -0.8571429
[6,] -0.8571429
[7,] 0.1428571
contrasts(responses$treatment) = my.backward.diff
mod_comp = glmer(score ~ treatment + (1 | user_id) +
(1 | passage), data = responses, family = "binomial")
summary(mod_comp)
Generalized linear mixed model fit by maximum likelihood
(Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula: score ~ treatment + (1 | user_id) + (1 | passage)
Data: responses
AIC BIC logLik deviance df.resid
13555.5 13621.1 -6768.7 13537.5 10853
Scaled residuals:
Min 1Q Median 3Q Max
-3.4462 -0.9848 0.5353 0.7489 3.0720
Random effects:
Groups Name Variance Std.Dev.
user_id (Intercept) 0.299 0.5469
passage (Intercept) 0.275 0.5244
Number of obs: 10862, groups: user_id, 153; passage, 12
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.283208 0.172210 1.645 0.10006
treatment1 -0.249715 0.077999 -3.201 0.00137 **
treatment2 0.009679 0.101917 0.095 0.92434
treatment3 0.079326 0.101799 0.779 0.43584
treatment4 -0.291307 0.100818 -2.889 0.00386 **
treatment5 -0.105192 0.100348 -1.048 0.29451
treatment6 -0.171392 0.099616 -1.721 0.08534 .
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) trtmn1 trtmn2 trtmn3 trtmn4 trtmn5
treatment1 -0.114
treatment2 -0.001 -0.651
treatment3 0.001 -0.001 -0.501
treatment4 0.001 0.002 -0.002 -0.502
treatment5 -0.001 -0.003 0.002 0.001 -0.494
treatment6 0.284 0.003 -0.001 -0.001 0.000 -0.508
#plot(Effect(mod_comp, focal.predictors = "treatment"))
ef <- data.frame(Effect(mod_comp, focal.predictors = "treatment"))
Now the accuracy plot
ggplot(data = ef, aes(x = treatment, y = fit))+ labs(x = "Treatment", y = "Comprehension score")+ ggtitle("Comprehension accuracy") +
geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=.2, position=position_dodge(width=0.1))+
geom_point(position=position_dodge(width=0.1), (aes(size=1.)), show.legend =F)
The results are beautiful. All average scores range between 66 and 49% (with the baseline of 25%). Comprehension scores across speech rates are virtually constant between 180 and 270 wpm, then they monotonically (linearly?) decrease. The only contrast between adjacent levels reaching significance is a drop between 270 and 315 wpm. Also, reading comprehension scores are numerically higher than all listening comprehension scores: significantly so when reading was compared with the audio book speed (which is an interesting finding on its own).
Reading times are given per chunk. Some reading times (and as a result, reading rates) are completely insane, in both directions – too long or too short. Need to decide how to trim them. Here are distributions of the raw data in minutes or wpm. The median before trimming is 254 wpm.
#this is removing all those who were bad listeners or otherwise infelicitous
reading_times <- reading_times %>% filter(!user_id %in% bad, user_id %in% good)
read_rate <- inner_join(reading_times, chunks[, -1])
Joining, by = c("passage", "chunk_num")
read_rate$read_time <- (read_rate$chunk_end - read_rate$chunk_start)/1000/60
#this is a by-percent breakdown of the distribution
quantile(read_rate$read_time, seq(0, 1, 0.01))
0% 1% 2% 3% 4%
0.00025000 0.01002733 0.01166667 0.01336600 0.01501400
5% 6% 7% 8% 9%
0.01717333 0.02065000 0.02705733 0.03729333 0.05449933
10% 11% 12% 13% 14%
0.07334000 0.08758133 0.09990067 0.11576267 0.12770800
15% 16% 17% 18% 19%
0.14040333 0.15310667 0.16526133 0.17709000 0.19108000
20% 21% 22% 23% 24%
0.20041000 0.21076133 0.21936867 0.22930600 0.23834133
25% 26% 27% 28% 29%
0.24613333 0.25222733 0.26102600 0.26786533 0.27573933
30% 31% 32% 33% 34%
0.28276333 0.28793200 0.29349067 0.29946400 0.30627933
35% 36% 37% 38% 39%
0.31237000 0.31844333 0.32407267 0.33040800 0.33667267
40% 41% 42% 43% 44%
0.34254667 0.34837067 0.35667400 0.36380000 0.36999600
45% 46% 47% 48% 49%
0.37615333 0.38351600 0.39223600 0.39997400 0.40752267
50% 51% 52% 53% 54%
0.41650000 0.42331000 0.43106867 0.43980600 0.44898867
55% 56% 57% 58% 59%
0.45738667 0.46286267 0.47117467 0.47819600 0.48600000
60% 61% 62% 63% 64%
0.49542000 0.50289267 0.51369733 0.52190133 0.52858600
65% 66% 67% 68% 69%
0.54126333 0.55027467 0.56006667 0.57049800 0.58106333
70% 71% 72% 73% 74%
0.59493333 0.60674400 0.62099600 0.63299667 0.64952400
75% 76% 77% 78% 79%
0.66586667 0.68034600 0.69636533 0.71279400 0.72663067
80% 81% 82% 83% 84%
0.74424000 0.76342867 0.78032200 0.80373600 0.82833733
85% 86% 87% 88% 89%
0.85652000 0.89149467 0.92244000 0.95265933 0.98772267
90% 91% 92% 93% 94%
1.03017000 1.08529267 1.13057467 1.19211133 1.28467400
95% 96% 97% 98% 99%
1.41357333 1.58037867 1.79813533 2.06467867 3.02722133
100%
35.33828333
read_rate$rate = read_rate$chunk_words/read_rate$read_time
round(quantile(read_rate$rate, seq(0, 1, 0.01)),2)
0% 1% 2% 3% 4% 5%
2.63 37.37 50.54 62.94 69.74 76.23
6% 7% 8% 9% 10% 11%
82.85 89.28 96.28 101.93 107.33 112.33
12% 13% 14% 15% 16% 17%
116.27 119.57 123.95 127.42 130.99 134.92
18% 19% 20% 21% 22% 23%
137.39 141.60 145.92 149.52 152.79 155.89
24% 25% 26% 27% 28% 29%
158.97 163.26 167.21 170.79 174.81 177.50
30% 31% 32% 33% 34% 35%
181.58 185.68 189.66 194.21 197.29 200.88
36% 37% 38% 39% 40% 41%
203.24 207.48 211.96 215.24 219.31 222.06
42% 43% 44% 45% 46% 47%
226.75 230.22 233.84 238.46 241.72 244.49
48% 49% 50% 51% 52% 53%
246.77 250.10 253.91 257.98 263.46 267.46
54% 55% 56% 57% 58% 59%
271.45 275.15 279.89 283.95 288.74 293.24
60% 61% 62% 63% 64% 65%
298.80 303.66 308.38 314.46 320.19 325.52
66% 67% 68% 69% 70% 71%
331.97 339.70 346.52 354.20 361.19 368.69
72% 73% 74% 75% 76% 77%
376.36 386.51 396.15 407.88 417.26 429.75
78% 79% 80% 81% 82% 83%
443.38 458.75 481.94 511.25 556.47 607.89
84% 85% 86% 87% 88% 89%
663.09 743.87 844.24 928.41 1075.33 1267.16
90% 91% 92% 93% 94% 95%
1476.10 1996.83 2755.71 4053.62 5227.79 6159.40
96% 97% 98% 99% 100%
7051.93 7996.53 9061.46 11305.44 504000.00
summary(read_rate$rate) #before trimming
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.6 163.3 253.9 1278.5 407.9 504000.0
#median 254
My solution so far was to take the hypothesized (and commonly observed) reading rate of 280 wpm and restrict the range to 2.5 or 1/2.5 times this rate. The median comes out at 246 wpm. But this is really something to discuss!
read_rate <- read_rate %>% filter(rate > 280/2.5 , rate < 280 * 2.5)
hist(read_rate$rate, breaks = 30)
summary(read_rate$rate) #after trimming
Min. 1st Qu. Median Mean 3rd Qu. Max.
112.1 179.0 246.3 270.5 332.4 699.0
#median 246
Given the trimming above, we find the mean reading rate by subject by passage and then average this by subject. Median is 262 wpm.
part_passage <- read_rate %>% group_by(user_id, passage) %>% summarise(mean_rate = mean(rate))
part_rate <- part_passage %>% group_by(user_id) %>% summarise(mean_rate = mean(mean_rate))
summary(part_rate$mean_rate)
Min. 1st Qu. Median Mean 3rd Qu. Max.
118.8 200.5 261.9 277.7 346.2 609.6
#262
Now we combine reading rate and reading comprehension scores by passage.
scores_passage <- scores_passage %>% filter(treatment == "reading")
read_passage <- inner_join(part_passage, scores_passage)
Joining, by = c("user_id", "passage")
part_readcomp <- read_passage %>% group_by(user_id) %>% summarise(readcomp = mean(mean_score))
What is the influence of one’s reading rate on their reading comprehension? None whatsoever. This is another illustration of how these two are disjoint (a point made in Kuperman’s random forest paper and a recent special issue on online vs offline measures of comprehension). I could do a better job with scores (which are bound to the 0-1 scale now) etc, but I doubt it will change much.
plot(read_passage$mean_rate, read_passage$mean_score)
cor.test(read_passage$mean_score, read_passage$mean_rate)
Pearson's product-moment correlation
data: read_passage$mean_score and read_passage$mean_rate
t = -0.61629, df = 817, p-value = 0.5379
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.08992831 0.04701832
sample estimates:
cor
-0.02155611
hist(read_passage$mean_rate)
mod_passage = lmer(mean_score ~ mean_rate + (1 | user_id) + (1 | passage), data = read_passage)
#cor(fitted(mod_passage), read_passage$mean_rate)^2
mod_passage1 = lmer(mean_score ~ mean_rate + (1 | user_id) + (1 | passage), data = read_passage, subset = abs(scale(resid(mod_passage)))<2.5)
summary(mod_passage1)
Linear mixed model fit by REML ['lmerMod']
Formula:
mean_score ~ mean_rate + (1 | user_id) + (1 | passage)
Data: read_passage
Subset: abs(scale(resid(mod_passage))) < 2.5
REML criterion at convergence: -444
Scaled residuals:
Min 1Q Median 3Q Max
-2.73352 -0.57625 0.04311 0.61750 2.19748
Random effects:
Groups Name Variance Std.Dev.
user_id (Intercept) 0.01224 0.1106
passage (Intercept) 0.02694 0.1641
Residual 0.02443 0.1563
Number of obs: 809, groups: user_id, 150; passage, 12
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.692e-01 5.223e-02 12.813
mean_rate -6.237e-05 6.964e-05 -0.896
Correlation of Fixed Effects:
(Intr)
mean_rate -0.368
Now let’s see if reading and listening comprehension correlate within participant.
scores_all %>% select(user_id, treatment, mean_score) %>% spread(key = treatment, value = mean_score) -> scores_wide
round(cor(scores_wide[, -1]),2)
reading sp180 sp225 sp270 sp315 sp360 sp405
reading 1.00 0.36 0.19 0.26 0.24 0.30 0.13
sp180 0.36 1.00 0.22 0.25 0.28 0.33 0.17
sp225 0.19 0.22 1.00 0.17 0.20 0.23 0.16
sp270 0.26 0.25 0.17 1.00 0.25 0.33 0.10
sp315 0.24 0.28 0.20 0.25 1.00 0.39 0.23
sp360 0.30 0.33 0.23 0.33 0.39 1.00 0.22
sp405 0.13 0.17 0.16 0.10 0.23 0.22 1.00
Fairly weak correlations all around, especially for the very high speech rate 405 wpm.
Now let’s break participants into groups by their reading comprehension and see if this predicts a different inflection point in listening comprehension.
scores_all %>% filter(treatment == "reading") -> tmp1
tmp2 <- data.frame(user_id = tmp1$user_id, readcomp = cut(tmp1$mean_score, breaks = quantile(tmp1$mean_score, c(0, 0.33, 0.66, 1)), labels = c("low", "mid", "high"), include.lowest = T))
#include read comp level into raw data
responses1 <- inner_join(responses, tmp2)
Joining, by = "user_id"
table(responses1$readcomp)
low mid high
3744 3528 3672
#run a model with an interaction by read comp level
responses1 %>% group_by(treatment, readcomp) %>%
summarise(sum_score = sum(score, na.rm = T),
n = NROW(score)) %>% mutate(mean_score = sum_score/n) -> x
#ggplot(aes(x = treatment, y = mean_score, col = readcomp), data = x) + geom_point()
mod_comp_brdown = glmer(score ~ treatment * readcomp + (1 | user_id) +
(1 | passage), data = responses1, subset = treatment != "reading", family = "binomial", control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=2e5)))
contrasts dropped from factor treatment due to missing levelscontrasts dropped from factor treatment due to missing levels
summary(mod_comp_brdown)
Generalized linear mixed model fit by maximum likelihood
(Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula:
score ~ treatment * readcomp + (1 | user_id) + (1 | passage)
Data: responses1
Control:
glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
Subset: treatment != "reading"
AIC BIC logLik deviance df.resid
7044.0 7175.8 -3502.0 7004.0 5371
Scaled residuals:
Min 1Q Median 3Q Max
-3.0929 -0.9621 0.5456 0.8091 2.0640
Random effects:
Groups Name Variance Std.Dev.
user_id (Intercept) 0.2595 0.5094
passage (Intercept) 0.1401 0.3743
Number of obs: 5391, groups: user_id, 152; passage, 12
Fixed effects:
Estimate Std. Error z value
(Intercept) -0.12838 0.17620 -0.729
treatmentspsp225 0.19272 0.16910 1.140
treatmentspsp270 0.33480 0.16922 1.978
treatmentspsp315 -0.06359 0.17016 -0.374
treatmentspsp360 -0.06465 0.16979 -0.381
treatmentspsp405 -0.14100 0.16794 -0.840
readcompmid 0.65779 0.20413 3.222
readcomphigh 1.06321 0.20487 5.190
treatmentspsp225:readcompmid -0.11970 0.24948 -0.480
treatmentspsp270:readcompmid -0.28465 0.24822 -1.147
treatmentspsp315:readcompmid -0.11964 0.24821 -0.482
treatmentspsp360:readcompmid -0.33538 0.24743 -1.355
treatmentspsp405:readcompmid -0.50129 0.24504 -2.046
treatmentspsp225:readcomphigh -0.44315 0.24682 -1.795
treatmentspsp270:readcomphigh -0.41198 0.24922 -1.653
treatmentspsp315:readcomphigh -0.34930 0.24666 -1.416
treatmentspsp360:readcomphigh -0.41692 0.24724 -1.686
treatmentspsp405:readcomphigh -0.52530 0.24427 -2.150
Pr(>|z|)
(Intercept) 0.46624
treatmentspsp225 0.25441
treatmentspsp270 0.04787 *
treatmentspsp315 0.70862
treatmentspsp360 0.70338
treatmentspsp405 0.40113
readcompmid 0.00127 **
readcomphigh 2.11e-07 ***
treatmentspsp225:readcompmid 0.63138
treatmentspsp270:readcompmid 0.25146
treatmentspsp315:readcompmid 0.62979
treatmentspsp360:readcompmid 0.17528
treatmentspsp405:readcompmid 0.04078 *
treatmentspsp225:readcomphigh 0.07258 .
treatmentspsp270:readcomphigh 0.09832 .
treatmentspsp315:readcomphigh 0.15673
treatmentspsp360:readcomphigh 0.09174 .
treatmentspsp405:readcomphigh 0.03152 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation matrix not shown by default, as p = 18 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
ef1 = data.frame(Effect(mod_comp_brdown, focal.predictors = c("treatment", "readcomp")))
contrasts dropped from factor treatment due to missing levels
ggplot(data = ef1, aes(x = treatment, y = fit, col = readcomp))+ labs(x = "Treatment", y = "Comprehension score")+ ggtitle("Comprehension accuracy") +
geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=.2, position=position_dodge(width=0.1))+
geom_point(position=position_dodge(width=0.1), (aes(size=1.)), show.legend =F)
At all levels of reading comprehension, the drop is between 270 and 315 wpm - even if it’s not always significant. This is a tad greater than the reading rate.
Finally, moving to a very interesting question of whether people with greater reading rate tolerate higher speech rates better?
quantile(read_rate$rate, c(0, 0.33, 0.66, 1))
0% 33% 66% 100%
112.0598 201.3696 295.6064 699.0079
tmp2 <- data.frame(user_id = part_rate$user_id, mean_rate = part_rate$mean_rate, ratefactor = cut(part_rate$mean_rate, breaks = quantile(part_rate$mean_rate, c(0, 0.33, 0.66, 1)), labels = c("low", "mid", "high"), include.lowest = T))
#include read comp level into raw data
responses2 <- inner_join(responses, tmp2)
Joining, by = "user_id"
table(responses2$ratefactor)
low mid high
3600 3528 3672
#run a model with an interaction by read comp level
responses2 %>% group_by(treatment, ratefactor) %>%
summarise(sum_score = sum(score, na.rm = T),
n = NROW(score)) %>% mutate(mean_score = sum_score/n) -> x
#ggplot(aes(x = treatment, y = mean_score, col = ratefactor), data = x) + geom_point() + facet_grid(~ ratefactor)
mod_rate_brdown = glmer(score ~ treatment * ratefactor + (1 | user_id) +
(1 | passage), data = responses2, family = "binomial", control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=2e5)))
summary(mod_rate_brdown)
Generalized linear mixed model fit by maximum likelihood
(Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula:
score ~ treatment * ratefactor + (1 | user_id) + (1 | passage)
Data: responses2
Control:
glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
AIC BIC logLik deviance df.resid
13297.3 13464.5 -6625.6 13251.3 10626
Scaled residuals:
Min 1Q Median 3Q Max
-3.3831 -0.9834 0.5358 0.7484 3.0261
Random effects:
Groups Name Variance Std.Dev.
user_id (Intercept) 0.2782 0.5274
passage (Intercept) 0.2764 0.5257
Number of obs: 10649, groups: user_id, 150; passage, 12
Fixed effects:
Estimate Std. Error z value
(Intercept) 0.18232 0.20779 0.877
treatment1 -0.14324 0.13761 -1.041
treatment2 -0.11261 0.17841 -0.631
treatment3 0.26491 0.17912 1.479
treatment4 -0.53761 0.17728 -3.033
treatment5 -0.23435 0.17344 -1.351
treatment6 -0.08183 0.17340 -0.472
ratefactormid 0.05281 0.20176 0.262
ratefactorhigh 0.26605 0.19945 1.334
treatment1:ratefactormid -0.11536 0.19464 -0.593
treatment2:ratefactormid 0.30604 0.25433 1.203
treatment3:ratefactormid -0.40940 0.25455 -1.608
treatment4:ratefactormid 0.55751 0.25295 2.204
treatment5:ratefactormid -0.10753 0.24879 -0.432
treatment6:ratefactormid -0.13827 0.24714 -0.559
treatment1:ratefactorhigh -0.17368 0.19150 -0.907
treatment2:ratefactorhigh 0.03997 0.24983 0.160
treatment3:ratefactorhigh -0.18234 0.25070 -0.727
treatment4:ratefactorhigh 0.24709 0.24878 0.993
treatment5:ratefactorhigh 0.42657 0.24765 1.722
treatment6:ratefactorhigh -0.13812 0.24546 -0.563
Pr(>|z|)
(Intercept) 0.38026
treatment1 0.29791
treatment2 0.52789
treatment3 0.13916
treatment4 0.00243 **
treatment5 0.17663
treatment6 0.63699
ratefactormid 0.79351
ratefactorhigh 0.18223
treatment1:ratefactormid 0.55338
treatment2:ratefactormid 0.22885
treatment3:ratefactormid 0.10777
treatment4:ratefactormid 0.02752 *
treatment5:ratefactormid 0.66560
treatment6:ratefactormid 0.57583
treatment1:ratefactorhigh 0.36443
treatment2:ratefactorhigh 0.87289
treatment3:ratefactorhigh 0.46701
treatment4:ratefactorhigh 0.32059
treatment5:ratefactorhigh 0.08499 .
treatment6:ratefactorhigh 0.57365
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation matrix not shown by default, as p = 21 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
ef2 = data.frame(Effect(mod_rate_brdown, focal.predictors = c("treatment", "ratefactor")))
ggplot(data = ef2, aes(x = treatment, y = fit, col = ratefactor))+ labs(x = "Treatment", y = "Comprehension score")+ ggtitle("Comprehension accuracy") +
geom_errorbar(aes(ymin=fit-se, ymax=fit+se), width=.2, position=position_dodge(width=0.1))+
geom_point(position=position_dodge(width=0.1), (aes(size=1.)), show.legend =F) + facet_grid(~ ratefactor)
This is really interesting! (The panels need reordering). What reading rate does is manipulate the range of changes in the listening rate, from the smallest in the fastest readers to the largest in the slowest readers. More specifically, fastest readers are worse at listening comprehension at low speeds but pretty consistent (i.e., lose less) over the entire speech rate range. They are the most tolerant to very high speech rates, and there is no clear drop in performance (inflection point).
In readers with both the mid- and low-reading rate, there is a clear drop. Readers with a mid-range reading rate maintain top performance at higher speech rates than slowest readers (up to 315 wp vs 270 wpm). So there is a clear hierarchy: fastest readers are the most tolerant, followed by mid-range, followed by low range of reading rate. But this tolerance comes at a cost. Top performance in listening comprehension is associated with mid and low reading rates, while fastest readers are mediocre in this task.