For 35 individuals, we have their responses to 40 questions, divided into 'pre-test' and 'post-test'. The individuals were randomly assigned to four groups, and one control group. Between the two tests, the individuals assigned to the three non-control groups had some linguistic training of different types. In addition, 7 of the participants benefited from highlighting of vowels. We want to find if there is any difference between the pre-test and post-test scores, and whether there were differences by group. A table of the distribution of the participants is below. HLV is 'highlight vowel'.
## recode
## Control HLV Morphology Morphonology Phonology
## 7 7 7 7 7
To get the data into a dataframe format consisting of response variable (post-test score) and explanatory variables (pre-test score, group membership), I recorded 'vowel blindness' when the result was “0-1”. I then counted the number of instances of vowel blindness for the pre-test and the post-test scores. For the pre-test, there were 808 instances of vowel blindness and for the post-test the figure was 570. The total possible was 35 x 40 = 1400.
Before training the randomly assigned groups, we should establish that there is no pre-existing difference in abilities. Therefore we examine the pre-test scores against the treatment groups that the participants will later be assigned to
Here we can see the pre-test scores divided by group. By eye, the HLV group seems to have a slightly higher pre-test score. We can test for statistical differences between the five groups
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = prevb ~ recode, data = reem2)
##
## $recode
## diff lwr upr p adj
## HLV-Control 1.0000 -5.867 7.867 0.9930
## Morphology-Control -4.5714 -11.439 2.296 0.3236
## Morphonology-Control -1.8571 -8.725 5.010 0.9332
## Phonology-Control -2.0000 -8.867 4.867 0.9143
## Morphology-HLV -5.5714 -12.439 1.296 0.1565
## Morphonology-HLV -2.8571 -9.725 4.010 0.7475
## Phonology-HLV -3.0000 -9.867 3.867 0.7127
## Morphonology-Morphology 2.7143 -4.153 9.582 0.7808
## Phonology-Morphology 2.5714 -4.296 9.439 0.8122
## Phonology-Morphonology -0.1429 -7.010 6.725 1.0000
There is no statistically significant difference (none of the p-values are smaller than 0.05). All of the confidence intervals contain zero.
Check for differences in the pre-test scores between the control group and the four treatment groups consolidated
The control group is on the left and appears to have a greater median than the other four groups combined.We test for the difference
##
## Welch Two Sample t-test
##
## data: prevb by groups2
## t = 0.8374, df = 8.061, p-value = 0.4265
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.250 6.965
## sample estimates:
## mean in group Control mean in group Treatment
## 24.57 22.71
The p-value is 0.4265 meaning that there is no statistically significant difference between the control group and the four other groups. These two sets of tests indicate that the groups are equally skilled at the pre-test stage.
First, a boxplot of post-test scores divided into control group and four treatment groups consolidated
The post-test score for the control group is on the left. By eye the control group appears to have a higher mean than the other three groups together. We can test for the difference
##
## Welch Two Sample t-test
##
## data: postvb by groups2
## t = 7.014, df = 8.023, p-value = 0.0001096
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 8.632 17.082
## sample estimates:
## mean in group Control mean in group Treatment
## 26.57 13.71
The p-value is very small at 0.0001096 and so we can confidently say that the post-test scores between the control group and the four treatment groups are different.
Then by the individual groups (four treatment and one control)
The control group appears to have both a larger median and a larger variance. The highlight vowel group (HLV) has one outlier, represented by the dot towards the x axis.
Use an anova test for differences between the five sample means
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = postvb ~ recode, data = reem2)
##
## $recode
## diff lwr upr p adj
## HLV-Control -11.5714 -17.591 -5.551 0.0000
## Morphology-Control -13.1429 -19.163 -7.123 0.0000
## Morphonology-Control -12.8571 -18.877 -6.837 0.0000
## Phonology-Control -13.8571 -19.877 -7.837 0.0000
## Morphology-HLV -1.5714 -7.591 4.449 0.9408
## Morphonology-HLV -1.2857 -7.306 4.734 0.9708
## Phonology-HLV -2.2857 -8.306 3.734 0.8045
## Morphonology-Morphology 0.2857 -5.734 6.306 0.9999
## Phonology-Morphology -0.7143 -6.734 5.306 0.9968
## Phonology-Morphonology -1.0000 -7.020 5.020 0.9885
The control group has a mean different from the other four groups. Compared pairwise, any combination containing 'Control' does not have zero in its confidence interval.
First a boxplot
The post-test scores seem to be smaller than the pre-test scores. We use a paired t-test because we have the results with the two sets of scores matched by participant
##
## Paired t-test
##
## data: prevb and postvb
## t = 6.563, df = 34, p-value = 1.616e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.694 8.906
## sample estimates:
## mean of the differences
## 6.8
It is clear from the very small p-value (1.616e-07) that means are indeed different. The mean difference between post and pre test scores is 6.8.This means that on average post scores were 6.8 lower than pre-test scores.
Now we use linear regression to find out the effect of the 'factor' variables of group membership. First for the difference between the control group and the four treatment groups combined
##
## Call:
## lm(formula = postvb ~ prevb + groups2, data = reem2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.607 -1.666 -0.015 1.751 7.254
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.551 3.347 4.94 2.3e-05 ***
## prevb 0.408 0.126 3.23 0.0028 **
## groups2Treatment -12.100 1.428 -8.47 1.1e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.33 on 32 degrees of freedom
## Multiple R-squared: 0.746, Adjusted R-squared: 0.73
## F-statistic: 46.9 on 2 and 32 DF, p-value: 3.08e-10
The model has an adjusted R-squared value of 0.7297, meaning that 73% of the variance in the post-test scores is explained by the model. (This is pretty good!). Both variables, the pre-test score and membership of the consolidated treatment group, are statistically significant with p-values of 0.00285 and the vanishingly small 1.10e-09. The pre-test score variable has a positive sign, meaning that a higher pre-test score is related to a higher post-test score (which makes sense). membership of one of the four treatment groups reduces the post-test score by 12.0998 holding the pre-test score constant. This is the significant negative difference illustrated on the boxplot.
We can plot the differences, with the control group coded as zero and represented by the red dots
The effect of the individual groups is available through linear regression
##
## Call:
## lm(formula = postvb ~ prevb + recode, data = reem2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.61 -2.16 0.02 1.82 7.25
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.039 3.724 4.31 0.00017 ***
## prevb 0.429 0.142 3.02 0.00525 **
## recodeHLV -12.000 1.847 -6.50 4.1e-07 ***
## recodeMorphology -11.183 1.952 -5.73 3.4e-06 ***
## recodeMorphonology -12.061 1.860 -6.48 4.3e-07 ***
## recodePhonology -13.000 1.863 -6.98 1.1e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.44 on 29 degrees of freedom
## Multiple R-squared: 0.754, Adjusted R-squared: 0.711
## F-statistic: 17.7 on 5 and 29 DF, p-value: 4.77e-08
The reference level is the group membership variable 'Control'. The results mean that compared to the score from members of the control group, all the other four groups had lower post-test scores. This is indicated by the negative signs. The smallest difference was for the Morphology treatment group, which had on average a post-test score 11.1834 points lower than a control group member with the same pre-test score. The greatest difference was for Phonology at 12.9999.
This is a statistically significant model, and an interesting result although the sample size is small.
Classification trees are a relatively new approach to breaking up data for classification. The 'splits' occur at points which are statistically significant. Here I have used exactly the same formula as for the linear regression
The tree is really simple–there are indeed 7 participants in the control group (Node 2). Within the treatment group, the tree splits the participants into those with a pre-test vowel blindness score of less than or equal to 20, and those with a higher score. This model would be useful for prediction. We might test it against the larger sample.
Eight questions in the pre-test and the post-test question sets are defined as distractor questions. The vowel blindness scores for distractors for the pre-test and post-test scores are shown in the boxplot below. The pre-test distractor scores appear to have a higher mean compared to the post-test scores.
Using a paired t-test for differences in means shows a statistically significant difference. The post-test distractor scores are lower by on average 1.8 points.
##
## Paired t-test
##
## data: reem2$predist and reem2$postdist
## t = 5.332, df = 34, p-value = 6.359e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.114 2.486
## sample estimates:
## mean of the differences
## 1.8
Using linear regression
##
## Call:
## lm(formula = postdist ~ predist + recode, data = reem2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6044 -0.5765 -0.0456 0.8381 1.7295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.4802 0.8230 5.44 7.4e-06 ***
## predist 0.0177 0.1336 0.13 0.89532
## recodeHLV -1.2806 0.6420 -1.99 0.05555 .
## recodeMorphology -2.4159 0.6479 -3.73 0.00083 ***
## recodeMorphonology -1.9924 0.6434 -3.10 0.00432 **
## recodePhonology -1.5765 0.6420 -2.46 0.02030 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.2 on 29 degrees of freedom
## Multiple R-squared: 0.364, Adjusted R-squared: 0.254
## F-statistic: 3.32 on 5 and 29 DF, p-value: 0.0172
The results are interesting because the pre-test distractor score is not statistically significant. The most significant variable is Morphology with a p-value of 0.000832. Morphology training reduced vowel blindness across the distractors by on average 2.41591 points.
The list below shows the number of instances of vowel blindness by question on pre-test and post-test, and also the difference between them. The list is sorted by pre-test question with the worst vowel blindness score (31 out of 35). This was question 20. For the same question, the equivalent post-test score is 8, resulting in a difference between the two of 23.
## Pre.VBScore Pre.Q Post.VBScore Post.Q Difference
## 1 31 Q20VB 8 Q20 23
## 2 28 Q25VB 20 Q25 8
## 3 27 Q11VB 9 Q11 18
## 4 27 Q17VB 15 Q17 12
## 5 27 Q26VB 26 Q26 1
## 6 25 Q5VB 5 Q5 20
## 7 25 Q30VB 13 Q30 12
## 8 25 Q38VB 10 Q38 15
## 9 24 Q2VB 24 Q2 0
## 10 24 Q14VB 13 Q14 11
## 11 24 Q19VB 11 Q19 13
## 12 24 Q28VB 8 Q28 16
## 13 23 Q4VB 10 Q4 13
## 14 23 Q15VB 13 Q15 10
## 15 23 Q22VB 12 Q22 11
## 16 23 Q32VB 16 Q32 7
## 17 22 Q21VB 20 Q21 2
## 18 22 Q37VB 17 Q37 5
## 19 21 Q13VB 6 Q13 15
## 20 21 Q18VB 9 Q18 12
## 21 21 Q36VB 5 Q36 16
## 22 20 Q10VB 17 Q10 3
## 23 20 Q29VB 19 Q29 1
## 24 20 Q40VB 16 Q40 4
## 25 19 Q9VB 19 Q9 0
## 26 19 Q12VB 14 Q12 5
## 27 19 Q27VB 4 Q27 15
## 28 18 Q3VB 14 Q3 4
## 29 17 Q7VB 20 Q7 -3
## 30 17 Q8VB 16 Q8 1
## 31 17 Q34VB 25 Q34 -8
## 32 17 Q35VB 17 Q35 0
## 33 16 Q23VB 18 Q23 -2
## 34 16 Q33VB 12 Q33 4
## 35 15 Q6VB 18 Q6 -3
## 36 15 Q39VB 6 Q39 9
## 37 13 Q1VB 16 Q1 -3
## 38 12 Q31VB 18 Q31 -6
## 39 8 Q16VB 24 Q16 -16
## 40 0 Q24VB 7 Q24 -7
I developed a small piece of code to simplify the task of adding up the vowel blindness scores. This is much quicker than the manual method I had previously used and probably less likely to include human errors in tabulation. I went back through the pre-test and post-test results before adding them to the follow-up test results. I conducted these tests: a boxplot of the verbal blindness scores of the three tests; t-test to compare the means of the post-test and the follow-up test; and ANOVA of follow-up test by highlighting. First, a boxplot comparing the results of the three tests:
pretestsum <- read.csv("C:/Users/Stephen/Desktop/Reem/pretestsum.csv")
postestsum <- read.csv("C:/Users/Stephen/Desktop/Reem/postestsum.csv")
follupsum <- read.csv("C:/Users/Stephen/Desktop/Reem/follupsum.csv")
pilot <- merge(pretestsum, postestsum, by = "users")
pilot1 <- merge(pilot, follupsum, by = "users")
p <- as.data.frame(pilot1)
pstack <- stack(p)
names(pstack) <- c("VBScore", "Test")
boxplot(pstack$VBScore ~ pstack$Test, main = "VB Scores by Test")
By eye, it looks as though follow-up results and posttest results are very similar. We can use a paired t-test to see if individual participants had changed vowel-blindness scores:
t.test(p$followupsum, p$posttestsum, paired = TRUE)
##
## Paired t-test
##
## data: p$followupsum and p$posttestsum
## t = -1.451, df = 34, p-value = 0.156
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.7436 0.4579
## sample estimates:
## mean of the differences
## -1.143
The p-value of 0.156 indicates that there is no difference. Finally, highlighting. First, here is a boxplot of follow-up scores by highlighting:
boxplot(p$followupsum ~ p$highlightvowel, main = "Effects of highlighting")
There appears to be little difference. Test this:
faov <- aov(p$followupsum ~ p$highlightvowel)
summary(faov)
## Df Sum Sq Mean Sq F value Pr(>F)
## p$highlightvowel 1 58 57.9 1.24 0.27
## Residuals 33 1544 46.8
And of the follow-up group those who had highlighting did not differ from those who did not.
Below is a plot of the records of individuals. Event 1 is the pre-test score, event 2 is the post-test score, and event 3 is the follow-up score. The blue line is the overall trend line for all of the participants. Note that the post-test score for some individuals is higher than the the pre-test score.
preemlong <- read.csv("C:/Users/Stephen/Desktop/Reem/preemlong.csv")
library(ggplot2)
pr <- ggplot(preemlong, aes(event, SCORE, group = id)) + geom_line()
pr1 <- pr + geom_smooth(aes(group = 1), size = 2, col = "blue", se = T)
pr1
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
Who are these individuals?
preem <- read.csv("C:/Users/Stephen/Desktop/Reem/preem.csv")
preemhigh <- subset(preem, subset = (POSTTEST > PRETEST))
## Error: object 'POSTTEST' not found
preemhigh
## Error: object 'preemhigh' not found
So they are all from the 'None' group and without vowel highlighting.
Instead of looking at individuals, we can examine their changed vowel-blindness over time grouped by treatment. The plot below shows the smoothed trend by group. The top line is for the “None” group. They do not have vowel highlighting. Their scores are essentially unchanged. The other four lines are for the other four groups, including the control group with highlighting (the next worse group). They show a similar pattern. I am working on getting a legend and colour-coding the lines.
pr <- ggplot(preemlong, aes(event, SCORE, group = id)) + geom_smooth(aes(group = GROUP1,
color = GROUP1), se = F, size = 2)
pr
We can do the same thing with vowel highlighting. The line which starts off highest is for those with highlighting. The improvement is dramatic.
ggplot(preemlong, aes(x = event, y = SCORE, group = id)) + geom_smooth(aes(group = hlv,
color = hlv), se = F, size = 2)
An interesting and highly useful quality of natural logarithms is that regressing one natural logarithm against each other provides the elasticity: the percentage change. Working in percentages means we don't need to worry about the units. I have converted the vowel blindness scores to logs and then performed the regression, using the 'None' group as the reference level.
preem <- read.csv("C:/Users/Stephen/Desktop/Reem/preem.csv")
preem$GROUP1 <- as.factor(preem$GROUP1)
preem$GROUP1 <- relevel(preem$GROUP1, ref = "None")
changeaftertest <- lm(LNFOLLOWUP ~ LNPRETEST + GROUP1, data = preem)
summary(changeaftertest)
##
## Call:
## lm(formula = LNFOLLOWUP ~ LNPRETEST + GROUP1, data = preem)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6487 -0.1973 0.0546 0.1834 0.7717
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.473 1.039 2.38 0.02403 *
## LNPRETEST 0.239 0.321 0.74 0.46300
## GROUP1Morphology -0.722 0.202 -3.57 0.00126 **
## GROUP1Morphonology -0.770 0.192 -4.01 0.00039 ***
## GROUP1NoneHLV -0.830 0.191 -4.34 0.00016 ***
## GROUP1Phonology -0.708 0.193 -3.68 0.00095 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.357 on 29 degrees of freedom
## Multiple R-squared: 0.486, Adjusted R-squared: 0.397
## F-statistic: 5.47 on 5 and 29 DF, p-value: 0.00115
These results are really interesting. Notice the negative signs by each group? This means that the followup score (the response variable) was in all cases smaller than the reference level (the 'None' group). So compared to the None group, all the groups came in with a lower score. This fits with the graph we did above. The number under 'Estimate' is the coefficient which can be read as the percentage improvement. So, for any given pre-test score, the followup score will be 72% lower for participants given the Morphology treatment and so on. From this it looks as though highlighting on its own is the most effective with a 82.96% reduction in vowel blindness. CAUTION! Although this model 'works' it explains only 40% of the variance (see the R-squared value of 0.3969). There will be some other factors not in this model. Probably the R-squared value will increase with a larger sample.